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Preface 


The  purpose  of  this  is  to  introduce  the  concept  of  lateral  mhibition  as  a 
generalized  technique  for  compressing  time/frequency  representations  of  electromagnetic 
and  acoustical  signals,  particularly  speech.  This  requires  at  least  a  rudimentary  treatment 
of  the  theory  of  frames — which  generalizes  most  commonly  known  time/frequency 
distributions — the  biology  of  hearing,  and  digital  signal  processing.  As  such,  this 
material,  along  with  the  interrelationships  of  the  disparate  subjects,  is  presented  in  a  tutorial 
style.  This  may  leave  the  mathematician  longing  for  more  rigor,  the  neurophysiological 
psychologist  longing  for  more  substantive  support  of  the  hypotheses  presented,  and  the 
engineer  longing  for  a  reprieve  from  the  theoretical  barrage.  Despite  the  problems  that 
arise  when  trying  to  appeal  to  too  wide  an  audience,  I  hope  this  thesis  is  a  cogent  analysis 
of  the  compression  of  time/frequency  distributions  via  lateral  inhibitory  networks. 


I  want  to  thank  my  thesis  committee  for  their  guidance,  which  was  instrumental  in 
helping  me  cohere  the  topics  investigated.  Individual  thanks  to  Col.  David  Norman  for 
his  immediate  and  helpful  feedback  regarding  questions  in  digital  signal  processing.  Dr. 
Bruce  Suter  and  his  colleague  Dr.  Mark  Oxley  for  the  hours  they  devoted  in  reviewing 
the  wavelet/frames  mathematical  literature  with  me,  and  to  Maj.  Steve  Rogers  for  his 
superb  insight  and  his  ability  to  immediately  pick  out  the  weaknesses  and  strengths  of 
the  research  as  it  progressed.  Most  importantly,  my  deepest  and  most  sincere  thanks  to 
my  advisor.  Dr.  Matthew  Kabrisky,  which  has  made  my  AFIT  experience  so  rewarding. 
Without  his  broad  knowledge  in  biology  and  engineering  this  research  effort  would  not 
have  been  the  success  I  feel  it  is.  I  have  truly  enjoyed  our  many  lively  discussions  and 
gedanken  experiments.  Final  thanks  to  my  friends  Capt.  Dennis  Ruck  and  Capt.  Greg 
Tarr  for  sharing  their  expertise  in  Unix  and  C  programming  with  me,  and  to  Pam  Young 
for  her  administrative  assistance  and  her  cheerful  disposition. 
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Abstract 

A  generalized  method  of  compressing  speech  using  lateral  inhibition  networks 
(LINs)  is  proposed  in  this  thesis.  Speech  signals  are  first  decomposed  using  three 
time/frequency  transforms;  the  short-time  Fourier  transform,  the  Gabor  transform,  and  the 
affine  wavelet  transform.  Redundant  time/frequency  coefficients  are  then  automatically 
eliminated  using  the  dynamics  of  a  LIN.  The  speech  is  finally  resynthesized  from  the 
compressed  representations  and  tested  for  intelligibility.  The  LIN  is  modeled  as  a  system 
of  shunting  non-linear  differential  equations  in  the  form  of  the  neuronal  cell  membrane 
equations  first  found  by  Hodgkin  and  Huxley.  Thus,  the  LIN  can  be  described  as  a 
neural  network.  LINs  perform  several  functions;  the  most  important  being  contrast 
enhancement.  In  contrast  enhancement,  spatial  peaks  and  edges  as  well  as  temporal 
changes  are  highlighted.  The  best  results  were  obtained  from  the  compressed  short-time 
Fourier  spectrum.  Nearly  30  times  compression — which  relates  to  over  95%  elimination 
of  the  spectrum — resulted  in  clearly  intelligible  speech.  However,  elimination  of  only  a 
small  percentage  of  the  affine  wavelet  spectrum  produces  wide  band  noise  that  is  offensive 
to  the  hearing  mechanisms. 
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Speech  Coding  and  Compression  Using  Wavelets 

and 

Lateral  Inhibitory  Networks 


I.  Introduction 


Background 

The  primary  method  of  communication  between  humans  is  speech.  Speech  is  so 
natural  that  people  have  demanded  from  science  and  technology  the  means  by  which  the 
human  voice  can  be  transmitted  in  communication  systems.  As  far  as  the  technology 
is  concerned,  easier,  more  efficient  communication  methods  are  possible — Morse-code 
transmission  for  instance.  However,  these  alternative  methods  require  practice  and  training 
by  the  user.  It  is  much  more  convenient  for  a  person  to  pick  up  a  telephone  and  simply 
talk  rather  than  tap  codes  on  a  key.  With  increasing  reliance  on  machines,  particularly 
computers,  the  demand  for  natural  communication  systems  between  machines  and  their 
operators  has  grown  as  well. 

The  demands  for  natural  speech  communication  systems  have  created  specific 
speech  processing  applications.  Some  of  the  most  common  applications  include  speech 
compression,  for  more  efficient  transmission  and  storage  of  the  speech  signal;  speech 
synthesis,  for  voice  response  systems;  speech  recognition;  speaker  recognition  and 
verification;  speech  processing  aids  for  the  hearing  impaired;  and  speech  enhancement  of 
noisy  or  mutilated  speech  [4, 38, 49, 53, 58]. 

The  fundamental  quesuon  in  speech  processing,  regardless  of  the  application  is: 
how  is  the  speech  signal  to  be  characterized  or  represented?  How  the  speech  signal  is 
represented  is  crucial,  for  that  representation  must  contain  the  information  conveyed  by 
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the  speaker  in  a  way  that  can  be  extracted  by  the  given  system.  Rabiner  and  Schafer 
proposed  two  major  considerations  in  a  speech  processing  system: 


1.  Preservation  of  the  message  content  in  the  speech  signal. 

2.  Representation  of  the  speech  signal  in  a  form  that  is  flexible  so  that 
modiflcations  may  be  made  to  the  speech  signal  without  seriously 
degrading  the  message  content  [58:2] 

These  considerations  are  die  cruces  of  this  research. 

The  question  of  representation  takes  on  the  greatest  meaning  in  automatic  speech 
recognition.  After  nearly  half  a  centuiy  of  research,  the  recognition  of  natural  speech  is 
still  mostly  an  unsolved  problem.  In  a  recent  review,  Lippmann  [42]  reported  that  the  best 
recognizers  are  able  to  achieve  only  50%  sentence  accuracy.  Because  of  the  slow  progress 
in  resolving  the  major  problems  in  speech  recognition,  some  researchers  are  questioning 
the  traditional  speech  representation  schemes  [55].  These  traditional  schemes  revolve 
around  the  short-time  Fourier  transform,  and  Linear  Predictive  Coding  (LPC)  [53,  58]. 
Instead,  calls  for  studying  more  natural  speech  encoding  methods  are  growing. 

Recently,  researchers  have  studied  the  output  of  inner  ear  models  as  feature 
sets  (representations)  for  recognition  systems  [42,  55].  Lippmann  [42]  reported  initial 
comparative  studies  tliat  show  biologically  motivated  preprocessors  are  significantly 
increasing  the  performance  of  current  recognizers,  particularly  in  the  presence  of  noise. 
However,  the  inner  ear  models  are  very  computationally  intensive  and  are  not  useful  for 
practical  systems. 

In  this  thesis,  an  alternative  to  a  full  inner  ear  model  is  examined.  This  alternative  is 
the  compressed  wavelet  representation  via  a  nonlinear  lateral  inhibitory  network  (LIN)  (see 
Chapters  II  and  III).  Briefly,  the  wavelet  representation  of  speech — indeed,  of  an  arbitrary 
signal — ^may  be  defined  as  the  summation  of  a  finite  number  of  weighted  wavelets  or 
elementary  signals  shifted  in  time  and  frequency,  or  shifted  in  time  and  dilated — depending 
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on  the  class  of  wavelet  [66].  The  wavelets  are  defined  locally  in  time  and  modulated; 
thus,  this  representation  describes  the  local  frequency  variations  of  a  signal. 

Besides  sharing  important  characteristics  with  biological  models,  the  wavelet 
representation  can  be  computed  using  algorithms  of  complexity  comparable  to  the  fast 
Fourier  transform  (FFT).  Therefore,  this  representation  has  the  potential  of  providing 
the  benefits  of  inner  ear  model  feature  sets  at  a  significantly  reduced  computational 
cost.  Furthermore,  LINs  provide  additional  characteristics,  such  as  automatic  gain  control 
(AGC),  enhancement  of  signals  in  noise,  and  substantia]  information  reduction  capabilities 
that  may  prove  useful  in  the  coding  and  compression  of  speech. 

Recent  mathematical  advances  have  generalized  a  wide  range  of  time/frequency 
analyses  by  means  of  wavelets  and  their  associated  frames.  Frames  and  frame  operators, 
along  with  specific  examples,  are  defined  in  the  following  chapter.  For  now  it  is  sufficient 
to  think  of  these  as  an  umbrella  description  for  time/frequency  representations.  One  of 
the  main  goals  of  this  research  is  to  match  this  general  description  of  signals — speech 
or  otherwise — with  a  general  approach  to  compressing  these  representations  based  on 
proven  biological  methods.  Indeed,  the  methods  developed  here  for  compressing  three 
specific  wavelet  representations  of  speech  can  be  generalized  to  compress  any  wavelet 
transform  space  defined  under  the  Weyl-Heisenberg  and  affine  classes.  These  methods 
can  be  applied  to  analyzing  and  coding  a  wide  range  of  electromagnetic  and  acoustic 
phenomena. 

Problem  Statement 

The  purpose  of  this  research  is  to  develop  non-linear  lateral  inhibitoiy  networks 
for  coding  and  compressing  the  affine  and  Weyl-Heisenberg  wavelet  representations  of 
speech.  The  compression  schemes  developed  are  also  tested  for  enhancing  noisy  speech. 
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Assumptions 

In  order  to  perform  the  analyses  described  above,  all  speech  signals  are  sampled  at 
8  kHz.  Energy  in  speech  drops  at  the  rate  of  6  dB/octave  above  600  Hz  and  there  is  little 
above  4  kHz.  Thus,  8  kHz  is  a  usable  sampling  rate. 

The  speech  signals  are  recorded  in  a  functioning  signal  processing  laboratory  without 
guarding  against  background  noise.  In  addition,  additive  white  Gaussian  noise  (AWGN) 
is  used  to  corrupt  the  speech  signals  in  the  noise  filtering  analysis. 

Scope 

The  vocabulary  used  in  this  thesis  is  the  spoken  digits,  zero  through  nine.  In 
addition,  the  phrase 


Hot  today;  chilly  and  cold  tomorrow 

is  used  to  test  the  effects  of  continuous  speech.  This  phrase  is  chosen  because  of  the  good 
balance  between  the  fricatives  and  the  vowel  sounds  found  in  the  words. 

Standards 

The  results  of  compressing  the  speech  signals  in  this  effort  are  compared  against 
the  results  of  similar  research  conducted  at  New  York  University,  Courant  Institute  of 
Mathematical  Sciences,  and  at  the  Air  Force  Institute  of  Technology  (AFTT)  using  more 
conventional  speech  compression  schemes. 

The  data  compression  and  noise  filtering  analyses  are  compared  against  the  analyses 
of  Alenquer  [1],  Bashir  [4],  Kabrisky  et  al.  [38],  and  McMillan  [49].  The  research  cited 
assumes  that  speech  can  be  coded  using  a  relatively  small  number  of  spectral  components 
[48, 57].  The  spectral  components  were  found  via  the  short-time  Fourier  transform  using  a 
Hamming  window  with  50%  overlap .  The  spectral  components  were  then  funcated  using 
a  rule-based  system.  Thus,  the  comparison  is  between  different  representation  schemes 
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as  well  as  different  compression  schemes — a  rule-based  system  versus  a  dynamical  LIN 
system.  A  Anal  comparison  is  made  between  the  compression  results  of  this  research  and 
those  of  Mallat  and  Zhong  [45].  In  that  study,  the  investigators  were  able  to  completely 
reconstruct  signals  from  only  the  local  extrema  of  the  afhne  wavelet  representation. 

Equipment 

All  software  development  is  performed  on  a  NeXT  computer.  This  machine  has  a 
resident  signal  processor,  the  Motorola  56001  Ehgital  Signal  Processor,  which  is  used  for 
inputting,  digitizing,  and  processing  all  voice  signals. 

The  NeXT’s  built-in  microphone  jack  is  connected  to  an  analog-to-digital  converter 
known  as  the  CODEC  (coder-decoder).  The  CODEC  uses  a  8012.8  Hz  sampling  rate  and 
8-bit  ^-law  non-uniform  quantization. 

Approach 

The  8  kHz  sampled  speech  is  decomposed  into  time/frequency  components  using 
three  different  wavelet  transforms.  These  are  the  short-time  Fourier  transform,  the  Gabor 
transform,  and  the  affine  wavelet  transform.  Although  not  usually  considered  as  wavelet 
transforms,  the  short-time  Fourier  transform  and  the  Gabor  transform  are  special  cases  of 
wavelet  operators  in  Weyl-Heisenberg  frames  [31, 32]  (see  Chapter  II  for  further  details). 

After  decomposition,  the  respective  wavelet  time/frequency  components  of  each 
signal  are  then  compressed  by  the  LIN.  It  is  then  determined  whether  these  compressed 
representations  can  yield  intelligible  speech  after  reconstruction.  If  so,  the  results  produced 
from  each  representation  are  compared.  These  results  are  also  compared  to  the  results 
found  in  [4,  45,  49].  Bit  rates  are  then  estimated  for  each  representation.  This  rate  is 
compared  to  the  rate  determined  in  [49]  and  cuirent  state-of-the-art  bit  rates.  Finally, 
the  speech  signals  are  corrupted  by  AWGN,  processed,  compressed,  and  reconstructed. 
The  reconstructed  signals  are  tested  for  intelligibility  and  the  signal-to-noise  ratio  is 
established. 
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Summary 

This  introductory  chapter  has  outlined  the  main  problem  addressed  in  this  research 
effort  In  short,  this  research  investigates  a  general  method  of  compressing  wavelet  frame 
spaces  using  biologically  inspired  lateral  inhibition  networks.  More  specifically,  the 
combined  wavelet  transform/LIN  process  is  investigated  for  compressing  and  reducing 
noise  in  speech  signals.  A  brief  description  of  the  approach,  the  scope,  and  the  standards 
used  was  also  given.  In  Chapter  II,  the  wavelet  frame  is  defined  in  the  more  exact  language 
of  mathematics.  Preceding  this  definition,  a  brief  background  discussion  is  provided  of 
several  well  known  time/frequency  analysis  schemes,  including  the  short-time  Fourier 
transform  and  the  Gabor  transform,  and  how  they  fit  in  the  wavelet  frame  taxonomy. 
Finally,  the  chapter  concludes  with  the  biological  considerations  that  are  most  relevant  to 
this  research.  Chapter  III  describes  of  the  methodology  and  the  system  design.  Chapter 
IV  discusses  the  results  and  the  comparisons  to  similar  research.  Finally  in  Chapter  V,  the 
conclusions  and  recommendations  are  given. 
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II.  Literature  Review 


Introduction 

This  chapter  begins  with  a  brief  historical  perspective  of  a  number  of 
time/frequency  analyses.  Their  interrelationship  and  their  classification  under  the  more 
general  wavelet  frames  is  also  given.  The  short-time  Fourier  transform,  the  Gabor  trans¬ 
form,  and  the  affine  wavelet  transform  are  then  defined.  The  discussion  then  turns  towards 
the  biological.  Some  putative  dynamics  of  the  inner  ear  and  LINs  are  introduced.  These 
biological  considerations  are  important  because  they  highlight  the  shared  computational 
analysis  performed  by  the  inner  ear  and  operators  in  affine  frames.  Furthermore,  these 
biological  considerations  play  a  fundamental  role  in  the  design  of  the  system  used  in 
this  research.  The  full  connection  between  the  biology  and  affine  frames,  and  how  this 
connection  influences  the  system  design  is  found  in  Chapter  III. 

Background 

In  1946,  Gabor  [23]  introduced  a  method  of  representing  signals  simultaneously  in 
time  and  frequency.  A  year  later,  he  demonstrated  this  by  using  an  optical/mechanical 
apparatus  he  designed  and  called  the  frequency  converter  [23].  The  device  sampled  the 
speech  signal — which  was  recorded  on  35mm  sound  film — optically  decomposed  the 
signal  into  the  Gabor  spectrum  and  compressed  the  sampled  signal  onto  a  photocell  via 
microscope  lenses,  and  finally  reconstructed  the  signal  by  refocusing  the  image  on  the 
photocell  onto  another  sound  film.  Although  much  different  in  design,  the  frequency 
converter  worked  on  a  similar  principle  to  that  of  the  channel  vocoder  developed  by 
Homer  Dudley  a  few  years  earlier  [17].  In  essence,  this  principle  determines  a  sliding 
interval  of  the  signal  and  decomposes  that  portion  of  the  signal  into  its  constituent 
frequency  components.  One  can  think  of  the  result  of  this  analysis  as  a  two  dimensional 
representation  of  the  signal  with  time  on  one  axis  and  frequency  on  the  other.  The 
resolution  in  both  the  time  and  frequency  axes  is  determined  by  the  chosen  interval. 
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This  type  of  analysis  differs  from  formal  Fourier  analysis,  which  would  decompose 
the  entire  time  signal  into  frequency  components  in  a  one-dimensional  representation. 
Although  one  has  access  to  all  the  frequencies  of  the  signal  using  this  method,  one  can 
not  determine  when  in  the  signal  these  frequencies  occur.  Knowing  where  particular 
frequencies  occur  in  a  speech  signal  is  crucial  in  speech  processing. 

The  vocoder  can  be  thought  as  computing  the  short-time  Fourier  transform  [58:341- 
344] — also  referred  to  as  the  complex  spectrogram — therefore,  it  is  not  by  accident  that 
the  Gabor  transform  and  the  short-time  Fourier  transform  are  related  [6].  As  the  name 
implies  the  short-time  Fourier  transform  determines  the  frequency  components  of  short 
time  samples  of  a  signal.  The  most  important  distinction  between  the  two  is  that  the 
Gabor  transform  is  more  localized  in  frequency  than  the  spectrogram,  i.e.,  with  the  use 
of  a  Gaussian  window  function,  the  Gabor  representation  “provides  the  best  spectral 
information  for  every  point  along  the  signal  variation”  [56:452]. 

Notwithstanding,  Gabor’s  method  did  not  become  popular  with  the  communica¬ 
tion  and  signal  analysis  communities  mainly  owing  to  the  lack  of  an  explicit  method 
of  determining  the  Gabor  coefficients — due  to  the  non-orthogonality  of  the  Gaussian 
window — and  the  frequency  converter’s  impracticality  for  communication  purposes  [24]. 
Questions  concerning  the  completeness  and  uniqueness  of  the  Gabor  expansion  were  also 
raised.  The  short-time  Fourier  transform,  thus,  became  one  of  the  main  analytical  tools  in 
speech  processing  and  continues  to  be  today. 

Interest  in  the  Gabor  representation  waxed  in  the  early  1980s  as  a  result  of  two 
key  findings.  Bastiaans  [5,  6]  found  a  method  of  computing  the  Gabor  coefficients  by 
means  of  an  auxiliary  function.  The  importance  of  the  finding  rests  with  the  fact  that  the 
coefficients  can  be  determined  using  the  efficient  FFT  [19].  Janssen  [36],  on  the  other 
hand,  proved  the  completeness  of  Gabor  expansions;  however,  he  also  proved  that  these 
expansions  are  not  unique. 

It  is  interesting  to  note  that  the  Gabor  representation  has  become  popular  in  image 
analysis  [7,  16,  56,  67];  however,  it  is  still  not  very  popular  for  analyzing  temporal 
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signals.  Gabor  originally  introduced  his  frequency/time  representation  in  order  to  study 
the  compression  of  speech  for  communication  purposes  [24] — the  main  subject  revisited 
in  this  research. 

Gabor’s  contention  that  the  Gaussian  window  provides  the  best  resolution  in  time 
and  frequency — first  proven  by  Weyl  [68] — has  recently  been  questioned  by  [35,  61]. 
Weyl’s  proof  is  based  on  finding  the  product  of  the  individual  uncertainties  in  time 
and  frequency.  Wechsler  and  his  colleagues  [35,  61]  argue  that  “...the  proper  way  to 
measure  conjoint  resolution  of  a  joint  representation  is  to  derive  such  a  measure  directly 
from  the  joint  representation  itself’  [35].  Furthermore,  these  authors  show  that  the 
Gabor  representation  and  the  short-time  Fourier  transform  are  smoothed  versions  of  the 
(cross-)Wigner  distribution  [9,  10,  11]. 

Quite  recently,  the  (cross-)Wigner,  Gabor,  and  short-time  Fourier  transforms  have 
been  classified  as  special  cases  of  Weyl-Heisenberg  (W-H)  frame  integral  operators 
[31,  32].  Duffin  and  Schaeffer  [18]  introduced  frames  for  nonharmonic  Fourier  series 
expansions  [15]  in  the  early  1950s.  In  simple  terms,  a  frame  consists  of  a  set  of  kernel 
functions  or  wavelets,  which  can  completely  characterize  a  given  function  by  inner 
products  of  that  function  with  the  set  of  wavelets.  Frames  can  be  distinguished  from 
bases  in  that  neither  orthogonality  nor  uniqueness  is  a  requirement  for  frames.  Frames  are 
divided  into  two  main  classes:  W-H  frames  and  affine  frames.  As  will  be  shown  in  the 
next  section,  the  two  frame  classes  are  characterized  by  two  distinct  types  of  wavelets. 

This  background  gave  a  veiy  brief  introduction  to  a  variety  of  time/frequency  anal¬ 
yses  and  their  interrelationship.  The  short-time  Fourier  transform,  the  Gabor  transform, 
and  wavelet  transforms  in  W-H  and  affine  frames  are  more  explicitly  defined  in  the  next 
few  sections. 

The  Short-Time  Fourier  Transform 

For  convenience,  all  signals  are  limited  to  one-dimensional,  continuous-time, 
square-integrable  signals  in  this  chapter.  More  specifically,  y,g  £  where  y 
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describes  the  signal  function,  g  describes  the  window  or  wavelet  function,  and  is 
the  Hilbert  space  over  the  reals.  However  these  restrictions  are  not  required  [32].  In 
particular,  the  definitions  and  properties  that  follow  are  generalizable  to  two-dimensional 
spatial  images  and  discrete  signals  as  well.  All  proofs  to  the  following  properties  may  be 
found  in  the  cited  material. 

As  was  mentioned  in  the  previous  section,  the  short-time  Fourier  transform  finds 
the  spectral  characteristics  of  short  segments  of  the  signal  y{t).  More  specifically, 

/oo 

y{t)g'it  ~  t)  c\p\-j2'nut]dt  (1) 

•OO 

where  ’  denotes  complex  conjugation,  and  t,T,i/  e  3?.  As  can  be  seen  from  Eq  (1),  the 
short-time  Fourier  transform  is  a  function  of  two  variables,  r  and  u,  associated  with  time 
and  frequency  respectively  [6].  The  function  g{1)  is  known  as  the  window  function.  In 
wavelet  parlance,  g{t)  is  the  mother  wavelet,  and  Y{t,u)  amounts  to  the  inner  product 
of  the  translates  and  modulates  of  g(t)  with  y(r)  [32].  The  original  function  y(t)  can  be 
reconstructed  from  Yir,  u)  by 

y{l)  =  /  /  y{T,^)9il  -  t)  exp[j27ri/t]  dr  du  (2) 

Eq  (2)  holds  for  any  g.  A  simple  proof  of  this  was  provided  by  Helstrom  [33].  Eq 
(2)  describes  how  the  short-time  Fourier  transform  expands  a  signal  into  a  continuous 
set  of  translated  and  modulated  wavelets  or  window  functions.  The  short-time  Fourier 
transform  is  also  known  for  discrete  signals  and  has  been  used  for  years  in  speech 
and  signal  processing  [58].  Notwithstanding,  the  signal  processing  perspective  of  the 
short-time  Fourier  transform  is  somewhat  different  from  the  perspective  just  provided. 
The  distinction  is  important;  however,  it  will  not  be  dealt  with  until  Chapter  III. 
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The  Gabor  Representation 

In  contrast  to  the  short-time  Fourier  transform,  the  Gabor  representation  expands 
a  signal  y{t)  into  a  discrete  set  of  translated  and  modulated  elementary  [23]  or  window 
functions  Gm,k{t)  [6].  Gm,k{t)  is  defined  as  follows: 

Gm,k{l)  =  -  "i")  exp[j2nkl3t]  (3) 

where  k,m  €  Z.  The  parameters  a  and  (3  are  associated  with  the  uncertainties  At  and 
A/,  respectively,  found  when  attempting  to  simultaneously  define  a  signal  in  both  time 
and  frequency  [23:432].  The  uncertainty  relationship  between  At  and  A/  is  defined  as 

AtA/>l/2  (4) 

This  relationship  is  known  as  the  Heisenberg  uncertainty  principle  in  the  quantum 
mechanics  analogue.  Weyl  found  that  a  Gaussian  window  g{t)  (see  Eq  (10))  minimized 
Eq  (4)  [23,  68]. 

With  this  in  mind,  the  Gabor  expansion  of  j/(t)  is  defined  as 

OO  OO 

Jt=— OO  Tn  =  — OO 
OO  OO 

=  E  E  sit  - "»«)  exp[;27rfc^t]  (5) 

OO  m  =— 00 

As  stated  earlier,  because  of  the  non-orthogonality  of  the  Gaussian  g,  solutions  to 
the  Gabor  coefficients  were  not  found  until  recently.  It  so  happens  that  an  infinite  number 
of  solutions  to  Cm^k  are  possible  [36].  Bastiaans  [5, 6]  found  the  following  solution: 

=  /  yit)  7*(f  -  ma)  e\p[-j27rk/3t]  dt  (6) 

— OO 
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The  function  7(<)  is  known  as  the  biorthogonal  function  of  g{t)  and  is  defined  by  the 
relationship 


—  ma)  e\p[~j2i:kpt]dt  =  6k  6^ 


(7) 


where  (5„  =  1  for  n  =  0,  and  (5„  =  0  for  n  ^  0  (the  Kronecker  delta)[6,  19,  22].  The 
biorthogonal  function  7(f),  in  turn,  can  be  found  via  the  Zak  transform  [22,  37].  The  Zak 
transform  of  g{t)  is  defined  as 


{Zg){T,u)=  g{T  +  kT)  exp[-j2iTkuT], 

k=—oo 

0  <  r  <  T,  0  <  CJ  <  T"'  (8) 


where  T  is  the  sampling  interval.  One  can  now  obtain  7(f)  by 

^(r)  =  /‘ - — -  (9) 

^  Jo  iZgY{T,u;) 

For  Gabor’s  original  Gaussian  window 

/  J2\ 

=  (  y j  exp[-ir(t/Tf]  (10) 

Bastiaans  [5,  6]  found  its  biorthogonal  function  to  be 
/  ,  \«/2 

7G(0=(y-^  {Ko/tt)  ^^^exp[n{t/Tf]  ^  (-l)"exp[-7r(n  +  l/2)^]  (11) 

n+\/2>t/T 

where  Kq  =  1.85407468.  Eqs  (10),  and  (11),  with  T  =  1,  are  plotted  in  Figure  1. 

Others  have  investigated  alternative  window  functions  for  the  Gabor  representation 
[6,  22].  Of  special  interest  is  Friedlander  and  Porat’s  [22]  investigation  of  the  one-sided 
exponential  window  gE{t)  for  representing  transient  signals.  The  function  gE{t)  and  its 
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Figure  1 .  The  Gaussian  Window  (left)  and  its  Biorthogonal  Function  (right) 
biorthogonal  function  7f:(#)  are  defined  as  follows: 

gsit)  =  exp[-t/r]  u(f )  (12) 

7£(t)  =  exp[t/r]  [-u(f  +  1)  +  2u(t)  -  u(f  -  1)]  (13) 

where  u{t)  is  the  unit  step  function  and  r  is  the  time  constant.  The  plots  of  Eqs  (12),  and 
(13)  are  illustrated  in  Figure  2. 


Figure  2.  The  Exponential  Window  (left)  and  its  Biorthogonal  Function  (right) 
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Janssen  [37:26]  found  the  following  alternative  method  of  determining  the  Gabor 
coefficients  using  the  Zak  transform  of  the  signal  and  window  functions: 

Cmk=  [  /  {Zy){T,u!){Zg)~\T,Lj)  expUlnrurt  -  jlnkut]  dr  du>  (14) 
Jo  Jo 

The  main  advantage  of  Eq  (6)  over  Eq  (14)  is  that  Eq  (6)  can  be  conveniently  and  easily 
approximated  via  the  FFT  [19,  22](see  Eqs  (39),  (40),  and  (41)). 

The  two-dimensional  Gabor  representation  is  illustrated  in  Figure  3.  As  the  figure 
shows,  the  magnitude  of  the  Gabor  coefficients  IC'm.tl  are  shown  uniformly  spaced  in  time 
and  frequency  as  defined  by  a  =  At  and  i3  =  A/. 


Figure  3.  The  Gabor  Time/Frequency  Analysis 


Wavelets 

Introduction  In  this  section  some  very  basic  definitions  of  wavelets  and  their 
associated  frames  are  given.  Up-to-date  surveys  of  wavelets  and  their  associated  frames 
have  been  written  by  Heil  and  Walnut  [31,  32J.  Reference  [31]  is  geared  towards 
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the  engineer.  The  main  contributions  to  the  field  of  wavelets  include  the  works  of  I. 
Daubechies,  A.  Grossman,  Y.  Meyer,  and  J.  Morlet  [15, 13,  14,  28].  Other  introductions 
to  wavelets  can  be  found  in  [12, 66]. 

As  was  discussed  at  the  outset  of  this  chapter,  the  short-time  Fourier,  Gabor,  and 
(cross-)Wigner  distributions  are  examples  of  W-H  fi^e  operators  [32].  These  transforms 
compute  the  inner  product  of  a  given  function  y{t)  with  the  translates  and  modulates  of  a 
function — or  wavelet — g{t).  Affine  frame  operators  are  characterized  by  transforms  that 
also  compute  the  inner  product  of  a  function  with  a  wavelet;  however,  in  contrast  to  W-H 
frames,  these  wavelets  are  translated  and  dilated — or  constricted,  depending  on  how  the 
analysis  is  defined.  This  is  shown  in  the  following  subsection. 

One  of  the  developments  in  frame  research  that  is  of  particular  importance  to  signal 
processing  is  that  a  method  for  finding  wavelets  that  produce  orthonormal  bases  in  affine 
frames  has  been  found  [13].  Orthogonality  is  important  when  reconstruction  from  a 
compressed  set  of  frequency  components  ir  rcnnircd.  The  issue  of  orthogonality  will  be 
revisited  in  greater  depth  in  the  i.cxt  chapter. 

Affine  Wavelet  Transform  The  affine  wavelet  transform  is  defined  in  this  subsection. 
To  differentiate  between  the  two  types  of  wavelets  and  to  use  the  notation  found  in  the 
wavelet  literature,  let  correspond  to  a  wavelet  function  in  an  affine  frame.  The  affine 
wavelet  transform  is  then  defined  as  [32] 

{Wy){a,e'’)  =  f  y{t)  ' eH  -  a)  dt  (15) 

J-oo  '  ' 

where  a,  6,  e  G  3?.  The  function  y{t)  can  be  reconstructed  from  {Wy){a,  c*)  by 

yit)  =  J  f*")  -  a)  da  dh  (16) 

As  in  the  case  of  the  other  transforms  discussed  in  the  preceding  two  sections,  the  discrete 
versions  of  Eqs  (15),  and  (16)  exist  [12]. 
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A  specific  case  of  a  wavelet  is  now  defined  and  used  to  explain  the  consequence  of 
Eq  (15)  [12].  Consider  an  orthonormal  basis  described  by  translating  and  constricting  the 
mother  wavelet  ipit),  where  r/jit)  is  defined  by 


i’it)  = 


1,  0  <  t  <  1/2 
-1,  l/2<t<l 
0,  otherwise 


(17) 


where  —  fc);  l,k  e  A/";  and  the  support  of  is  the  interval,  I,  defined  by 

the  inequality  0  <  2‘t  —  k  <  1.  This  orthonormal  basis  xpf  is  known  as  the  Haar  basis 
[13].  Eq  (17)  is  plotted  in  Figure  4. 
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Figure  4.  The  Haar  Mother  Wavelet 

Eq  (15),  with  in  the  integrand,  calculates  the  frequencies  proportional  to  2' 
contained  in  y  about  the  interval  I.  In  a  linear  filtering  perspective,  Eq  (15)  can  be 
modeled  by  correlating  y  with  a  series  of  overlapping  bandpass  filters  that  increase  in 
bandwidth  logarithmically — base  2  in  this  example-as  a  function  of  the  filter’s  center 
frequency. 

An  alternative  explanation  can  be  defined  in  terms  of  a  multilevel  resolution  analysis. 
Let  I  denote  the  level  of  analysis.  Furthermore,  for  simplicity,  let  the  analysis  be  performed 
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over  the  inteival  I,  defined  by  the  inequality  0  <  f  <  1,  of  the  signal  y{t).  At  the  first 
level,  /  =  0,  the  analysis  of  y{t)  is  performed  with  the  mother  wavelet  The 

analysis  amounts  to  the  inner  product  of  the  two  functions.  At  the  next  level  of  resolution, 
;  =  1 ,  the  wavelet  is  constricted  or  shrunk  along  the  time  axis  by  1  /2,  and  an  inner  product 
is  again  computed.  Since  the  interval  covered  by  the  wavelet  at  this  resolution  is  7/2, 
the  wavelet  is  translated  once  (A:  =  1)  and  a  second  inner  product  is  computed  in  order 
to  cover  the  entire  interval  7.  The  wavelet  at  this  level  becomes  V’f  =  —  k), 

where  A:  =  0, 1.  The  analysis  at  subsequent  levels  is  performed  similarly  by  constricting 
the  mother  wavelet  by  2~‘  at  level  I  and  computing  the  inner  product  of  y{t)  and  rpf  for 
/:  =  0, 1 , . . . ,  2'  —  1.  The  Haar  wavelets  at  levels  0-3  and  at  all  translations  to  cover  7  are 
shown  in  Figure  5. 
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Figure  5.  The  Haar  Wavelet  at  Four  Resolutions 

The  Fourier  transform  of  has  a  bandwidth  proportional  to  2‘.  This  means  the 
bandwidth  is  doubled  at  each  subsequent  level  of  resolution.  The  time/frequency  'inalys’ j 
{performed  by  xpf  is  illustrated  in  Figure  6.  The  vertical  bars  correspond  to  the  amplitude 
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of  the  correlation  coefficients  that  result  from  the  wavelet  analysis  just  described.  The 
value  of  the  coefficients  are  defined  by  their  height,  which  is  arbitrary  in  this  example  for 
instructive  purposes.  The  coefficients  are  also  shown  as  positive  values  to  simplify  the 
drawing.  The  frequency  axis  is  labeled  /,  and  the  time  axis  is  labeled  t.  Figure  6  shows 
that  the  interval  of  the  analysis  performed  by  a  wavelet  at  each  subsequent  level  decreases 
by  one  half  and  its  bandwidth  doubles,  as  expected  from  the  previous  discussion. 


Figure  6.  The  Affine  Wavelet  Time/Frequency  Analysis 

Meyer  and  others  have  defined  an  entire  class  of  smooth  orthonormal  bases  of  the 
form  ti’i.  [13],  which  perform  log2 

analyses  of  signals.  However,  the  logarithmic  analysis  of  the  wavelet  transform 
does  not  have  to  be  in  base  2.  The  affine  wavelet  transform  analyzes  functions  in  a 
logarithmic  scale  of  base  b  in  general.  This  characteristic  is  what  differentiates  affine 
wavelet  analysis  from  W-H  fhune  analysis.  The  W-H  wavelet  analysis  localizes  uniformly 
in  time  and  frequency,  whereas  the  affine  wavelet  analysis  localizes  logarithmically  in 
time  and  frequency.  This  difference  can  be  seen  by  comparing  the  illustrations  of  the 
Gabor  time/frequency  analysis  and  the  affine  wavelet  time/frequency  analysis  in  Figures 
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3  and  6.  Thus,  the  affine  transform  may  have  advantages  over  W-H  transforms  for  speech 
processing  and  other  nonstationary  signal  processing,  because  it  analyses  signals  over 
multiple  scales  of  resolution.  In  contrast,  for  example,  the  Gabor  analysis  requires  that  a 
specific  At  and  A/  be  defined  for  a  single  resolution  analysis.  It  may  not  always  be  easy 
to  find  these  values. 

The  analysis  of  signals  in  a  logarithmic  frequency  scale  performed  by  the  affine 
wavelet  transform  is  also  performed,  to  a  great  extent,  by  the  inner  ear.  In  the  following 
section,  the  parallels  between  the  dynamics  of  the  ear  and  wavelet  frames  are  explored. 

Biological  Considerations 

Introduction  This  section  provides  a  very  brief  introduction  to  the  anatomy  and 
physiology  of  hearing.  The  discussion  concentrates  around  the  topics  of  most  relevance 
to  this  research.  For  a  comprehensive,  current  introduction  to  the  science  of  hearing  see 
Gulick,  et  al  [29].  Most  of  the  material  covered  in  the  next  subsection  is  a  summary  of 
portions  of  that  text. 

The  Auditory  Periphery  The  peripheral  auditory  system  consists  of  the  external 
ear,  the  middle  ear,  and  the  internal  ear.  This  system  acts  as  a  transducer,  which  converts 
air  sound  pressure  into  electrical  impulses  that  our  brains  can  process.  In  addition,  the 
peripheral  auditory  system  acts  as  a  system  of  bandpass  filters.  These  filters  define 
the  frequency  range  that  animals  can  perceive.  In  humans,  this  frequency  range  is 
approximately  20  Hz-20  kHz  [29:217]. 

The  external  ear  is  composed  of  the  pinna,  the  external  meatus,  and  the  tympanic 
membrane.  The  pinna  and  the  meatus — commonly  referred  to  as  the  ear  and  ear  canal 
respectively — act  as  a  first  stage  filter  that  increases  the  amplitude  of  frequency  components 
in  the  range  of  approximately  500  Hz-10  kHz  [29:87].  At  the  end  of  the  meatus,  lies  the 
flexible  tympanic  membrane,  or  ear  drum,  which  is  set  into  complex  patterns  of  vibratory 
motion  by  external  sound  waves.  The  tympanic  membrane  forms  the  boundary  between 
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the  external  ear  and  middle  ear.  Thus,  to  summarize  the  function  of  the  external  ear,  sound 
waves  are  collected,  filtered,  and  propagated  to  the  tympanic  membrane  by  the  pinna  and 
the  meatus.  The  tympanic  membrane  then  responds  to  these  sound  waves  by  vibrating. 

Beyond  the  tympanic  membrane  lies  the  middle  ear.  The  middle  ear  is  comprised 
of  a  cavity  containing  a  chain  of  three  small  bones:  the  malleus,  incus,  and  stapes.  This 
chain  connects  the  tympanic  membrane  to  the  oval  window  of  the  cochlea.  The  fluid  filled 
cochlea  is  the  main  sensory  organ  of  hearing  and  comprises  the  internal  ear.  The  function 
of  the  bony  chain,  therefore,  is  to  transmit  the  vibratory  action  of  the  tympanic  membrane 
to  the  sensory  receptors  in  the  cochlea,  and  to  provide  an  impedance  match  between  free 
space  and  the  liquid  filled  cochlea. 

The  cochlea — Latin  for  snail — is  a  coiled  cavity  in  the  temporal  bone  containing 
three  fluid  filled  chambers:  the  scala  vestibuli,  scala  media,  and  scala  tympani.  A  stiff 
membrane,  called  the  basilar  membrane  separates  the  scala  media  and  scala  tympani.  In 
humans,  the  basilar  membrane  is  approximately  35mm  in  length,  .5mm  wide  at  the  apex 
of  the  cochlea,  and  .08mm  wide  at  the  base  (towards  the  oval  window).  The  basilar 
membrane  has  a  logarithmic  gradient  of  stiffness  that  increases  along  its  length  from  the 
apex  to  the  base  of  the  cochlea.  Both  the  tapered  shape  of  the  basilar  membrane  in 
conjunction  with  the  stiffness  gradient  produce  the  logarithmic  filtering  characteristics 
mentioned  earlier.  A  collection  of  structures  called  the  organ  of  Corti  rests  on  the  basilar 
membrane.  The  most  important  of  these  structures  are  the  tectorial  membrane,  the  inner 
hair  cells,  and  the  outer  hair  cells.  The  hair  cells  are  organized  along  the  basilar  membrane 
in  rows,  with  the  tectorial  membrane  covering  these  cells  like  a  canopy. 

The  cochlear  dynamics  are  now  described.  As  the  stapes  moves  back  and  forth  at 
the  oval  window,  longitudinal  waves  are  produced  in  the  fluid  of  the  interior  chambers  of 
the  cochlea.  These  waves  produce  localized  oscillations  of  the  basilar  membrane  whose 
positions  are  a  function  of  the  external  frequency  of  the  sound.  For  example,  a  1  kHz  tone 
produces  a  maximum  pattern  of  displacement  at  a  precise  location  along  the  length  of  the 
basilar  membrane.  The  action  of  the  basilar  membrane  produces  shearing  forces  on  the 
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inner  hair  cells  via  the  tectorial  membrane.  When  this  occurs,  the  inner  hair  cells  produce 
electrical  signals  that  are  transmitted  to  subsequent  processing  layers  of  the  brain  along 
the  cochlear  nerve.  Therefore,  the  cochlea  would  signal  the  occurrence  of  a  pure  1  kHz 
input  by  breaking  into  oscillations  at  a  specific  place  along  its  length. 

Complex  sounds  composed  of  a  number  of  frequencies  are  encoded  along  the 
basilar  membrane  by  oscillatory  displacements  at  specific  locations  corresponding  to  the 
frequencies  of  the  sound.  In  other  words,  the  basilar  membrane  can  be  thought  as  a  spatial 
frequency  template  or  tonotopic  map.  In  addition  to  this  spatial  information,  temporal 
information  is  also  produced  by  the  firing  rates  of  the  inner  hair  cells.  The  preferred 
or  characteristic  frequency  (CF)  of  an  inner  hair  cell  is  the  frequency  that  produces  the 
maximal  displacement  of  the  basilar  membrane  where  that  hair  cell  resides.  Hair  cells 
with  CFs  of  4  kHz  and  below  fire  at  the  same  rate  as  their  CF.  This  phenomenon  is  known 
as  phase  synchrony.  It  is  not  known  how  the  brain  uses  this  spatio/temporal  information; 
however,  it  is  known  that  tonotopicity  is  maintained  at  all  the  relay  stations  leading  to  the 
neocortex,  as  well  as  at  the  primary  sensory  fields  of  the  auditory  cortex.  The  assumption, 
thus,  is  that  spatial  patterns  are  important  to  the  brain.  The  meaning  of  the  temporal 
information  is  more  difficult  to  decipher. 

The  spatial  encoding  characteristics  of  the  basilar  membrane  should  not  be  construed 
as  only  a  simple  Fourier  analysis  capability.  The  situation  is  much  more  complex  than 
that  and  still  largely  not  understood.  This  is  illustrated  by  a  phenomenon  called  masking 
[29:300-313].  As  was  mentioned  earlier,  a  pure  tone  produces  localized  patterns  of 
electrical  activity  in  fibers  in  the  auditory  nerve  whose  CFs  coincide  with  the  tone’s 
frequency.  It  has  been  found  that  the  firing  rate  of  an  auditory  nerve  fiber  can  be 
suppressed  or  masked  by  presenting  tones  of  similar  frequencies  [62].  Furthermore, 
masking  is  not  possible  after  the  masking  tones  have  deviated  from  the  CF  by  a  certain 
amount.  Therefore,  masking  occurs  within  certain  frequency  bands  called  critical  bands 
[29:306].  How  these  bandwidths  are  determined  is  a  subject  of  controversy  [21,  50,  70]; 
however,  the  general  consensus  is  that  the  masking  phenomenon  suggests  that  the  cochlea 
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acts  as  system  of  overlapping  bandpass  filters.  Although  there  is  a  question  as  to  exactly 
how  wide  these  filters  are,  it  is  known  that  their  bandwidths  increase  logarithmically  as  a 
function  of  the  CF.  Patterson  [54]  has  determined  the  shape  of  these  filters  using  masking 
experiments  and  found  them  to  be  cone  shaped.  Figure  7  illustrates  this  phenomenon. 
The  intensity  and  bandwidth  of  the  filters  shown  in  Figure  7  are  not  precise;  they  are 


Figure  7.  Filter  Model  of  the  Cochlea;  Adapted  from  [29:307] 

meant  to  illustrate  the  filter  function  of  the  cochlea. 

The  simple  description  of  the  anatomy  and  physiology  of  the  cochlea  just  given 
does  little  justice  to  the  complexity  of  this  system.  Much  is  still  not  known  about  the 
function  of  the  cochlea,  in  particular  the  feedback  system,  called  the  efferent  system, 
which  incorporates  the  outer  hair  cells. 

As  is  evident  from  the  previous  section,  the  computational  analysis  performed 
by  the  ear  is  reminiscent  of  the  analysis  performed  by  the  affine  wavelet  transform. 
The  subsequent  processing  of  information  in  the  numerous  auditory  nuclei  beyond  the 
cochlear  nerve  is  not  understood.  However,  evidence  for  one  type  of  processing  has  been 


22 


accumulating  over  the  last  15  years.  This  type  of  processing  is  called  lateral  inhibition,  or 
is  also  known  as  on-center  off-surround  processing.  LINs  are  examined  next  subsection. 

Lateral  Inhibition  Networks  Lateral  inhibition  networks  were  first  found  by  Hartline 
and  Ratliff  [59, 60]  in  the  eye  of  the  horseshoe  crab  (Limuius).  Since  Ratliff’s  discovery, 
lateral  inhibition  has  been  found  to  be  a  common  processing  strategy  in  all  sensory  systems 
across  species  [64],  including  the  auditory  system  [40, 63].  LINs  perform  several  functions; 
the  most  important  being  contrast  enhancement  In  contrast  enhancement  spatial  peaks 
and  edges  as  well  as  temporal  changes  are  highlighted.  Other  important  functions  of  LINs 
include  noise  suppression,  automatic  gain  control,  buffering  of  information  for  short-term 
memory,  and  a  variety  of  oscillatory  functions.  These  effects  have  been  modeled  and 
studied  by  many  over  the  years  [20,  25, 27, 59, 60, 63]. 

As  discussed  above,  many  LIN  models  have  been  developed  (see  [26]  for  a  review). 
The  one  adopted  in  this  research  is  based  on  the  non-linear,  shunting  cell  membrane 
equation  (see  Eq  (18))  found  by  Hodgkin  and  Huxley  [26,  34]  in  electrophysical  studies 
of  the  giant  squid  axon.  In  a  study  that  investigated  the  effects  of  lateral  inhibition  on  the 
output  of  a  cochlea  model,  Shamma  [63]  used  an  additive  version  of  the  shunting  equation 
in  his  LIN  model.  Elias  and  Grossberg  [20],  however,  have  shown  that  the  shunting  model 
is  much  less  susceptible  to  oscillatory  behavior  than  the  additive  model,  and  therefore 
justifies  the  use  of  the  shunting  model  in  this  research.  Before  describing  the  membrane 
equation,  a  structural  description  of  a  typical  LIN  will  be  presented. 

Figure  8  illustrates  the  cell  interactions  of  a  recurrent  LIN.  The  cells  or  nodes  of 
the  system  are  labeled  v„.  Each  node  has  an  input  g„  and  produces  an  output  /(a:„).  The 
parameter  x„  is  called  the  activity  of  node  v„.  The  arrows  illustrate  excitatory  connections 
or  positive  gain  communication  lines,  and  the  small  circles  illustrate  inhibitory  or  negative 
gain  connections.  To  simplify  the  illustration,  only  the  connections  exiting  node  i',  are 
shown.  The  outputs  of  all  other  nodes  are  considered  to  be  similar  to  the  outputs  of  node 
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Figure  8.  Lateral  Inhibition  Network  Model 

As  Figure  8  shows,  LIN  interaction  can  described  as  a  process  cell  self  excitation 
and  neighborhood  inhibition.  The  pattern  of  excitation  does  not  have  to  be  limited  to 
individual  nodes.  Groups  of  nodes  can  mutually  excite  one  another  and  in  turn  inhibit 
a  surrounding  neighborhood.  The  networks  are  also  not  restricted  to  one  dimension.  In 
fact,  two-dimensional  LINs  are  the  ones  found  in  the  cortex. 

The  activity  and  output  equations  for  each  node  can  now  be  defined  as  in  Eq  (18) 
and  in  Eq  (19)  respectively.  Although  the  and  are  functions  of  time,  the  variable  t 
has  been  omitted  in  the  following  equations  for  simplicity. 
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All  properties  of  Eq  (18)  can  be  found  in  [20,  25,  27].  Eq  (19)  defines  a  sigmoidal 
function.  Other  sigmoid  functions  have  been  used,  in  particular  f{x)  =  1/(1  +  exp[— x]). 

The  shunting  Eq  (18)  is  in  the  form  of  Eq  (20)  found  by  Hodgkin  and  Huxley  [34], 
which  describes  the  membrane  potential  V  of  neurons  . 

Cj^V  =  {VP-V)f  +  (V+  -  V)g+  +  {V-  -  V)g~  (20) 

C  relates  to  a  constant  lipid  membrane  capacitance.  The  constants  V^,  V"'*',  and  V~ 
relate  to  the  passive,  excitatory,  and  inhibitory  membrane  potential  saturation  points 
respectively.  In  the  Hodgkin  and  Huxley  model,  these  saturation  points  indicate  the 
equilibrium  potential  of  specific  ions.  The  terms  gP,  g'^,  and  g~  are  the  passive,  excitatory, 
and  inhibitory  ionic  condu  lances  respectively.  By  simple  changes  of  variables,  Eq  (18) 
and  Eq  (20)  can  be  ated.  Capacitance  C  becomes  constant  one  in  the  shunting  model. 
Membrane  potential  V  is  characterized  by  activity  i,.  Membrane  potential  saturation 
points  V",  V'^,  and  V~  become  constants  0,  A,  and  B  respectively.  Likewise,  the  passive 
ioni^  conductance  g^  equals  one,  the  excitatoiy  conductance  g'*'  equals  the  input  to  the 
cell  p,  plus  feedback  /(z,),  and  the  inhibitory  conductance  g~  equals  an  inhibitory  gain 
D  times  the  sum  of  the  outputs  of  the  competing  neurons  Jn  reality,  there  are 

many  more  ionic  conductances  than  those  shown  in  Eq  (20)  (see  [64]),  but  these  can  be 
grouped  functionally  into  the  three  main  types  of  conductances:  passive  conductances, 
that  bring  the  cell  back  to  its  resting  potential;  excitatory  conductances,  that  raise  the 
cell  membrane  potential  towards  the  signaling  potential;  and  inhibitory  conductances,  that 
suppress  the  cell  membrane  potential  and  cell  firing. 

How  do  LINs  relate  to  what  has  been  described  thus  far?  One  of  the  main  problems 
addressed  in  this  research  is  finding  compression  codes  for  speech.  LINs  have  properties 
which  make  them  potentially  useful  for  eliminating  redundant  frequency  or  correlation 
terms  in  wavelet — W-H  or  affine — transform  space.  This  will  be  addressed  in  depth  next 
chapter. 
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Summary  and  Final  Comments 

In  this  chapter,  wavelets  and  wavelet  frames  were  introduced.  To  recap,  wavelet 
frames  are  divided  into  two  classes:  Weyl-Heisenberg  frames  and  affine  frames.  Most 
time/frequency  distributions,  including  the  short-time  Fourier  transform  and  the  Gabor 
transform,  belong  in  the  Weyl-Heisenberg  frame  class.  These  distributions  are  charac¬ 
terized  by  transforms  that  compute  the  inner  product  of  a  function  with  translates  and 
modulates  of  a  wavelet.  The  affine  wavelet  transforms,  on  the  other  hand,  are  charac¬ 
terized  by  transforms  that  compute  the  inner  product  of  a  function  with  translates  and 
dilates  of  a  wavelet.  It  was  also  shown  that  the  affine  wavelet  transform  can  be  thought 
as  computing  a  correlation  between  a  signal  and  a  series  of  overlapping  bandpass  filters 
whose  bandwidths  increase  logarithmically  as  a  function  of  frequency.  Evidence  was 
then  presented  that  suggests  that  the  peripheral  auditory  system,  particularly  the  cochlea, 
performs  a  similar  analysis  to  that  of  the  affine  wavelet  transform.  The  final  section  dealt 
with  the  architecture  and  dynamics  of  LINs. 

Next  chapter  consists  of  the  system  design  for  decomposing,  compressing,  and 
reconstructing  speech.  The  full  connection  between  LINs  and  wavelet  transforms  is 
shown. 
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III.  System  Design 


Introduction 

This  chapter  details  the  methods  used  to  decom|)ose,  compress,  and  regenerate 
speech  signals  in  this  research  effort  Three  transforms  are  used  for  this  purpose:  the 
short-time  discrete  Fourier  transform,  the  discrete  Gabor  transform,  and  the  discrete  affine 
wavelet  transform.  In  the  first  section,  the  software  and  signal  processing  environments 
are  described.  Several  definitions  and  the  notation  used  throughout  the  remainder  of  this 
document  is  provided  immediately  after.  The  remainder  of  this  chapter  concentrates  on 
defining  the  discrete  versions  of  the  transforms  introduced  in  the  last  chapter,  and  on  the 
subsequent  interaction  of  LINs.  A  variety  of  window  or  wavelet  functions  are  explored 
for  each  transform  defined.  Compression  is  achieved  by  eliminating  local  low  energy 
correlation  or  frequency  components  produced  by  the  varied  transforms  via  the  dynamics 
of  LINs.  The  interaction  between  these  networks  and  the  transform  outputs  is  fully 
discussed. 

System  Environment 

As  was  mentioned  in  the  introductory  chapter,  all  digital  signal  processing  and 
software  development  is  performed  on  the  NeXT  computer.  The  speech  signals  are 
sampled  at  8012.8  Hz  by  the  CODEC  and  quantized  using  8-bit  //-law  logarithmic 
compression  [65:76-78].  Each  sampled  and  quantized  speech  signal  is  then  stored  in 
Objective  C  files  by  resident  software  routines.  All  further  processing  of  the  speech 
data  is  performed  by  software  written  in  a  combination  of  Objective  C,  ANSI  C,  and  C 
macros  wrapped  around  assembly  language  vector  processing  routines  for  the  resident 
Motorola  56001  Digital  Signal  Processor — hereafter  called  the  DSP.  Most  array  processing, 
including  FFTs,  is  performed  by  the  DSP.  Whatever  processing  is  not  performed  by  the 
DSP  is  performed  by  the  NeXT’s  Motorola  68030  in  floating  point  format. 
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The  data  in  the  speech  files  are  easily  accessible  using  Objective  C  routines.  These 
routines  convert  the  data  from  8-bit  /x-law  to  a  16-bit  linear  format.  The  speech  data 
are  again  converted — this  time  to  floating  point — ^so  that  the  data  can  be  scaled  between 
values  small  enough  to  prevent  overflow  errors  in  the  DSP,  All  values  in  the  DSP  must  be 
maintained  between  -1  and  1  in  a  24-bit  linear  format;  thus,  the  floating  point  values  are 
scaled  then  converted  to  24-bit  linear.  The  linear  scaling  function  S(x)  used  is  described 
as 

S{x)  =  — — —{x„-m„)  +  mn  (21) 

Xq  7^0 

where 

Xo  =  the  maximum  value  of  the  range  of  x 
7710  =  the  minimum  value  of  the  range  of  x 
Xn  =  the  maximum  value  of  the  new  range 
run  =  the  minimum  value  of  the  new  range 

Outputting  the  processed  speech  is  nearly  a  reversal  of  the  process  just  explained. 
The  24-bit  DSP  data  are  converted  to  floating  point  and  processed  further  if  need  be. 
The  completely  processed  data  are  finally  converted  back  to  16-bit  linear  and  stored  in 
Objective  C  sound  files.  Resident  software  using  mouse  activated  windows  can  now  be 
used  to  activate  the  CODEC  and  play  back  the  synthesized  sound  files. 

Definitions  and  Notation 

The  discussion  and  definitions  in  Chapter  II  dealt  with  the  decomposition  and 
reconstruction  of  continuous  time  signals  y{t).  In  this  and  subsequent  chapters,  operations 
will  be  performed  on  discrete  sequences  y{n),  where  n  e  Z.  Discrete  y{n)  is  obtained 
from  the  continuous  time  signal  by  sampling  xj{t)  at  rate  XjT.  More  precisely, 

y{n)  =  y{t)\t=nT  =  y{nT)  (22) 
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Time  dependence  is  dropped  by  omitting  the  sampling  period  T  in  Eq  (22). 

Where  possible,  the  discrete  Fourier  transform  (DFT) — by  means  of  the  FFT — will 
be  used  to  decompose  signals.  The  DFT  and  inverse  DFT  are  defined  as 

N-\ 

X{k)  =  ^  x{n)  expl—jlirkn/N],  0  <  k  <  N  —  \  (23) 

n=0 

7V-I 

x(n)  =  l/N  exp\j2irkn/Nl  0<n<N -I  (24) 

fc=0 

respectively,  where  N  is  the  number  of  samples  in  x{n)  and  1/A'  is  the  fundamental 
digital  frequency  in  cycles.  Therefore,  the  frequency  index  k  is  related  to  the  digital  and 
analog  frequencies  /*.  and  v>ki  respectively,  as  follows  [43:287]: 

k-^  f,  =  k/N^  Uk  =  klNT  (25) 

Frequency  Index  Digital  Frequency  Analog  Frequency 
(cycles)  (Hz) 

As  previously  mentioned,  the  sampling  rate  for  all  signals  is  1/T  =  8012.8  Hz.  All  DFT 
sequences  are  of  length  N  =  256,  which  relates  to  a  window  of  NT  =  31.95  msec.  Thus, 
the  fundamental  analog  frequency  is  fi  =  \/NT  =  31.3  Hz.  A  31.95  msec,  window 
length  is  reasonable  since  it  is  known  that  speech  is  relatively  stationary  (i.e.,  the  statistics 
of  the  signal  remain  constant)  in  periods  of  approximately  30  msec,  or  less. 

The  windowed  versions  of  Eq  (23)  and  Eq  (24)  are  used  for  short-time  Fourier  and 
Gabor  decomposition.  The  next  two  sections  detail  both  the  decomposition  and  synthesis 
of  speech  signals  using  these  methods. 

Short-Time  Fourier  Decomposition! Reconstruction 

Introduction  In  the  previous  chapter,  the  short-time  Fourier  transform  was  intro¬ 
duced  and  described  in  the  perspective  of  the  more  general  theory  of  frames;  in  particular, 
Weyl-Heisenberg  frames.  In  that  perspective,  decomposition  of  signals  amounts  to  a 
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projection  onto  a  basis  defined  by  g{t  —  T)exp[j2Tn4],  or  the  translates  and  modulates 
of  g{t).  The  bases  defined  in  this  manner  need  not  be  orthogonal  to  form  complete  sets. 
However,  for  discrete  translations  and  modulations  of  g — as  in  the  case  of  the  Gabor 
expansion — care  must  be  taken  when  defining  g.  The  non-orthogonality  of  the  Gaussian 
g  created  difficulties  in  finding  a  solution  to  the  Gabor  coefficients  until  Bastiaans  and 
Janssen  found  the  solutions  defined  earlier. 

The  traditional  perspective  taken  by  the  signal  processing  community  is  somewhat 
different.  As  in  the  perspective  of  frames,  successive  intervals  of  the  desired  signal 
to  be  decomposed  are  analyzed  one  at  a  time.  However,  the  analysis  of  the  signal  is 
always  considered  to  be  an  orthogonal  expansion  into  a  combination  of  basis  functions 
defined  by  the  complex  exponent,  e\p[j2in>t].  In  other  words,  the  window  function  g  is 
not  considered  part  of  the  basis  but  part  of  the  signal  to  be  analyzed.  The  choice  of  g 
depends  on  the  ease  by  which  the  original,  or  close  approximation  of  the  function  y  can  be 
resynthesized  from  the  windowed  expansion,  and  on  the  specific  frequency  characteristics 
of  g  itself  (see  Harris  [30]  for  a  study  on  windows  and  their  characteristics).  Thus,  the 
question  of  whether  the  function  g  produces  an  orthogonal  set  or  not  rarely  comes  up.  An 
in-depth  treatment  of  discrete  short-time  Fourier  analysis  is  found  in  [58:250-354]. 

The  Short-Time  Discrete  Fourier  Transform  Three  window  functions  are  used 
in  the  short-time  Fourier  decomposition  of  speech  signals  in  this  research.  These  are 
the  rectangular  window,  the  Hamming  window,  and  the  compactly  supported  Gaussian 
window.  These  windows  along  with  the  short-time  versions  of  Eqs  (23)  and  (24)  are 
defined  in  this  section.  In  addition,  a  modification  to  the  inverse  short-time  DFT  is  defined, 
which  is  necessary  for  reconstructing  good  quality  speech  from  a  compressed  Fourier 
space. 

The  rectangular,  Hamming,  and  compactly  supported  Gaussian  windows  are  defined 
as 
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Rectangular; 

9R{n) 

Hamming: 

gnin) 

Gaussian: 

Pg(”) 


=  < 


=  < 


1,  0<n<iV-l 
0,  otherwise 

0  54  —  0.46 cos [27rn/(.V  —  1)],  0  <  n  <  —  1 

0,  otherwise 

exp[-4(2^)2],  0<n<iV-l 
0,  otherwise 


(26) 


(27) 


(28) 


The  plots  of  each  of  these  windows  are  shown  in  Figures  9,  10,  and  1 1  respectively. 


Figure  9.  The  Rectangular  Window 

The  decomposition  and  reconstruction  formulas  for  y{n)  depend  on  the  window 
used.  This  is  shown  by  first  defining  the  general  form  of  the  short-time  DFT  pair  as 

s-\ 

Y{m,  k)  =  ^  y{n)  g{n  —  mN)  exp[—j2nkn/N],  0  <  k  <  N  —  \  (29) 

n=0 

N-1 

y{n)g{n  —  mN)  =  1/A^  ^  Y(m,k)  expljlnkn/N],  0  <  n  <  —  I  (30) 

k=0 


31 


where  m  e  M.  When  a  rectangular  window  is  used,  y{n)  can  be  decomposed  and  exactly 
recovered  by  Eq  (30).  Since  gR{n  —  mN)  =  1  by  definition,  y{n)  is  recovered  exactly 
by  summing  over  all  window  shifts.  More  precisely 

Af-l 

Yiy(^)9R{n-mN)  =  y{n)  (31) 

m=0 

where  M  represents  the  number  of  subsequences  of  length  N  that  describe  the  finite  se¬ 
quence  y(n).  Meaning  that,  y{n)  is  of  length  MN.  The  decomposition  and  reconstruction 
formulas  for  y{n)  with  a  rectangular  window  thus  becomes 

iV-l 

y'nim,  k)  =  ^  y{n)  gR{n  —  mN)  exp[— j'27rfcn/A''],  0  <  A:  <  A'  —  1  (32) 

n=0 

A/-1  ;V-1 

y{n)  =  1/A’  Z  Z  >'«(  m,k)  exp[j27r/:n/A'],  0  <  n  <  A^  —  1  (33) 

nj=0  <:=0 

If  g{n)  is  not  rectangular  then  the  right  side  of  Eq  (33)  will  not  result  \nxj[n)  but 
in  y{ix)g{n  -  mA^),Vm.  In  other  words,  the  result  is  y{n)  multiplied  by  all  translations 
of  g{n).  Figure  12  shows  this  result.  The  top  graph  shows  the  original  y(n),  a  sampled 
sinusoid.  Directly  underneath,  the  consequence  of  Eq  (33),  in  the  case  g  is  Gaussian,  is 
shown. 

Nevertheless,  an  approximation  of  j/(n)  can  still  be  reproduced  if  the  windows 
are  overlapped.  That  is,  if  g{n)  is  repeatedly  translated  by  /?  <  A’,  then  y{n)  can  be 
approximated.  To  illustrate  this.  Figure  13  shows  the  plot  of  Sg{n)  =  ~  ^■^)> 

for  R/N  =  0.50,  or  50%  overlap.  The  ideal  result  of  summing  the  overlapped  window 
function  would  be  the  unit  function,  as  illustrated  by  the  gray  line.  In  this  ideal  case,  y 
could  be  exactly  recovered.  However,  it  can  be  seen  that  for  the  Gaussian  window  a  low 
amplitude  wobble — with  a  fundamental  firequency  of  62.6  Hz  in  the  present  case-results. 

The  error  in  j/(n)  can  be  reduced  with  more  overlap  of  g[n).  Figure  14  shows  this 
for  Sg{n)j2  with  R/N  =  0.25.  The  plot  is  scaled  to  show  the  small  peak  error  in  this 
case,  which  is  approximately  15  x  10“^.  A  similar  situation  arises  when  the  Hamming 
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Figure  12.  Effect  of  Windowing  with  non-Rectangular  Functions 
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Sg(n) 


Figure  13.  Sum  of  Gaussian  Windows  with  50%  Overlap 

window  is  used.  Figure  15  shows  the  plot  of  Sh{n)  =  Er  duin  -  rR),  with  R/N  =  0.50. 
As  can  be  seen  from  this  figure,  the  peak  error  in  this  case— approximately  5.6  x  10"^ — is 
less  than  the  peak  error  produced  by  the  Gaussian  window  with  25%  overlap.  It  is  known 
that  the  error  produced  in  the  speech  signal — by  processing  the  signal  in  the  manner  just 
described — is  barely  perceptible  if  a  Hamming  window  with  50%  overlap  is  used.  The 
result  of  using  the  compactly  supported  Gaussian  window  in  this  process  is  investigated 
next  chapter. 

The  overlap  and  add  method  described  above  is  more  precisely  defined  by  the 
following  transform  pair: 

v-i 

y(r,  fc)  =  ^  y{n)  g(n  -  rR)  exp[-j27rfcn/iV],  0  <  k  <  N  -  \  (34) 

n:=:0 

N/R{Sf-\)N-\ 

y{n)  =  XjN  ^  y(r, A:)  exp[72;rfcn/7V],  0  <  n  <  A'  -  1  (35) 

r=0  it=0 
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Figure  14.  Sum  of  Gaussian  Windows  with  25%  Overlap 


Figure  15.  Sum  of  Hamming  Windows  with  50%  Overlap 
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where  y{n)  denotes  the  approximation  of  y{n).  In  this  research,  R/N  =  0.50  is  chosen 
in  all  cases. 

One  last  modification  must  be  made  for  reconstructing  the  speech  signal  from  the 
compressed  Fourier  components.  Before  defining  this  modification,  the  consequence 
of  compressing  in  the  frequency  space  is  considered.  Theory  states  that  in  order  to 
avoid  time  aliasing,  Y(m,k)  must  be  uniformly  sampled  at  N  frequencies  for  each 
interval  m  [58:270].  The  reconstruction  methods  defined  above  satisfy  this  condition. 
However,  as  previously  mentioned,  compression  in  this  thesis  amounts  to  eliminating 
many  frequency  components  in  each  defined  interval  of  the  signal  sequence.  What  remains 
is  an  underszunpled  version  of  y'(m,  k)  with  nonuniformly  spaced  frequencies.  Therefore, 
lime  aliasing  is  to  be  expected.  As  shown  in  the  next  chapter,  the  resulting  time  aliasing 
manifests  itself  as  noise  or  amplitude  errors  concentrated  around  the  window  ends  of 
the  synthesized  speech.  These  errors  can  be  attenuated  by  multiplying  the  reconstructed 
intervals  of  y(n)  by  a  Hamming  window.  The  Hamming  window  is  a  good  choice  since 
it  attenuates  the  signal  at  the  ends  of  the  window.  Also,  as  was  discussed  previously,  the 
Hamming  window,  when  overlapped  by  50%,  produces  very  little  distortion  in  the  signal. 
The  modification  leads  to  the  following  reconstruction  equation: 

2^f-\  A  -1 

y{n  )  =  K  ^  ^  Yir^k)  gnin  -  rNfl)  t\^\j2Txkn / N],  0  <  n  <  —  1  (36) 

r=0  it=0 

where  K  is  some  constant,  and  y{n)  is  the  approximation  of  y(n)  that  results  from  the 
compressed  short-time  Fourier  coefficients  Y{t,  k). 

Since  Y{r,  k)  is  complex,  Eq  (36)  can  be  expressed  in  the  following  way: 

yin)  =  K  X)  ^)|  -  rN/2)  exp\J2‘Kkn/N], 

r=0  k=0 

0<n<N  -I  (37) 


where  Or.k  is  the  phase  angle  of  y'(r,  k). 


37 


Furthermore,  since  y{n)  is  real  going  into  the  transform,  the  regenerated  approxi¬ 
mation  of  y{n)  should  be  real  coming  out  This  means  that  the  imaginary  terms  in  Eq 
(37)  can  be  omitted.  This  is  done  by  combining  the  exponential  terms,  applying  Euler’s 
identity,  and  eliminating  the  resultant  imaginary  term,  resulting  in 

2M-1  N~\ 

y{n)  =  K  ^  ^|y(r,fc)|g;,(n-riV/2)cos(27rA:n/A'  +  0,,t),  0<n<A’-l  (38) 

r=0  fc=0 

This  is  the  form  of  the  reconstruction  formula  used  by  [1, 4, 49]  in  their  investigations  of 
speech  coding  and  noise  reduction  using  a  subset  of  the  short-time  Fourier  space. 

To  summarize,  the  equations  used  to  decompose  and  reconstruct  the  signal  sequences 
depends  on  the  window  used  and  whether  the  short-time  Fourier  coefficients  have  been 
compressed  or  not.  If  the  window  function  is  rectangular  and  compression  is  not 
performed,  then  Eqs  (32)  and  (33)  are  used.  For  Gaussian  and  Hamming  windows,  the 
valid  transform  pair  is  described  by  Eqs  (34)  and  (35)  if  tlie  Fourier  coefficients  are  not 
compressed.  When  compression  is  performed,  Eqs  (34)  and  (36)  are  used  regardless  of 
the  window  used  to  compute  Y{r,k). 


Gabor  Decomposition/Reconstruction 

For  the  Gabor  decomposition  of  y{n),  Bastiaans’  solution  (see  Chapter  II)  is  adopted 
here.  In  general,  the  discrete  version  of  Eq  (6)  can  be  represented  as  (see  [19]) 

yV-l 

Cm,k  =  XI  e\p{-j2nkn/N],  0  <k  <  N  -  \  (39) 

n=0 


where 


<3-1 

y^„  =  ^  y{n  +  qN)  7*(n  -I-  qN  —  mN),  0<m<M  —  \,0<n<N— I  (40) 

<7=0 

where  Q  <  M.  The  parameter  Q  is  a  free  variable  in  Eq  (40)  and  determines  the  interval 
of  support  of  7.  The  consequence  of  varying  Q  is  discussed  next  chapter.  Eq  (39)  can  be 
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conveniently  computed  using  the  FFT. 


The  signal  y{n)  can  be  recovered  from  CTn,k  by 
M-\ N-\ 

y(”)  =  H  Cm,kg{n  -  mN)  exp\j2irkn/N],  No<n<2N-\  (41) 

m=0  fc=0 

where  Nq  depends  on  the  window  function  g{n). 

Eq  (41)  can  be  computed  using  the  FFT  algorithm.  Nevertheless,  another  method 
is  used  in  this  effort.  By  similar  arguments  used  in  the  previous  section,  Eq  (41)  can  be 
rewritten  as 


a;-i  a^-1 

y{^)  =  H  iC'm.tl  g{n  -  mN)  cos[27rA:n/iV  +  <?!)„,,<..],  No<n  <2N  -  \  (42) 

m=0  fc=0 

Eq  (42)  is  inconvenient  to  program  in  the  computer  since  the  cosine  term  must  be 
computed  each  time  for  every  different  phase  angle  that  results.  This  takes  considerable 
processor  time.  It  is  more  desirable  to  precompute  sine  and  cosine  tables  and  access  them 
from  memory  as  needed.  Fortunately,  Eq  (41)  can  be  rewritten  in  an  equivalent  form  of 
Eq  (42)  that  when  programmed  can  take  advantage  of  precomputed  tables.  That  form  is 

Af-I  A'-l 

y{^)—  ^^[^^(^rn,k)cos{2-Kkn/N)  —  Im(Cm,k)s\n(2TTkn/N)]g{n  —  mN), 

m=0  k=Q 

A 0  <  n  <  2N  -  1  (43) 


where 


^e(x)  = 

Im{x)  =  — (45) 

Two  window  functions  are  investigated  in  the  Gabor  expansion/regeneration  of 
y{n ).  These  are  the  sampled  versions  of  the  Gaussian  window,  defined  in  Eq  (10),  and  the 
one-sided  exponential  window,  defined  in  Eq  (12).  The  discrete  versions  of  these  signals, 
in  the  form  used  in  this  research  effort,  along  with  their  corresponding  biorthogonal 
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functions  are  defined  as 


Gaussian: 

gain) 

BiorthogonaJ  of  go (n): 

7g(«) 


2’'''‘exp{— 7r{n/A^)^],  —2N  <  n  <  2N  —  1 
.  ^  ^  -  -  (46) 

0,  otherwise 


I  £\(n)  ELo^2(n,0,  -QN  <  n  <  QN  -  1 
I  0,  otherwise 


(47) 


where 


e,(n)  =  (1/2)'^'^  (AV7r)~^''^exp  [tt  (n/A^)^] 
eiinj)  =  (-l)*-l”''^'‘'''''^J‘'''exp  [-7r  ([ln/iV|  +  1/2J  +  i  +  1/2)^] 

A'o  =  1.85407468  (48) 


Exponential: 

gsin) 

Biorthogonal  of  g£;(n): 

lE{n) 


exp[— n/A']  tt(n),  0  <  n  <  2N  —  1 
0,  otherwise 


(49) 


exp[n/N][— w(n  +  N)  +  2u(n)  —  w(n  —  A'')]  (50) 


where  [xj  is  the  largest  integer  less  than  or  equal  to  x,  (e.g.,  [3.7]  =  3).  Eqs  (46)  and 
(49)  define  Nq.  In  the  case  go  is  used  in  the  transform.  No  =  -2N;  when  gE  is  used, 
A'^o  =  0.  In  addition,  Eq  (50)  implies  Q  =  1  for  the  exponential  window’s  biorthogonal 
function.  The  sum  in  Eq  (11),  which  describes  the  continuous  biorthogonal  function  of 
the  Gaussian  function,  converges  very  rapidly.  This  is  reflected  in  Eq  (47),  where  the  sum 
is  iterated  over  only  three  values  of  the  variable  i. 

As  in  the  case  of  the  short-time  DFT,  when  the  Gabor  coefficients  are 
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compressed,  the  overlap  and  add  method  is  used  to  suppress  the  noise  that  results  from 
time  aliasing.  Thus,  the  Gabor  coefficients,  with  R/N=0.50,  are  found  by 

^-1 

Cr,fc  =  13  y--."  t\p[-j2irkn/N],  0  <  k  <  N  -  \  (51) 

n=:0 


where 

Q-i 

yr,«  =  E  y("  +  -  rN/2),  0  <  r  <  2M  -  0  <  n  <  N  -  \  (52) 

q—0 

A  few  steps  are  needed  to  obtain  the  approximation  of  the  signal  sequence  y(n).  The 
approach  taken  is  to  reconstruct  two  partial  sequences  of  y{n) — each  a  shifted  version  of 
the  other  by  N/2.  Each  sequence  is  then  windowed  by  repeatedly  shifted  versions  qh,  and 
finally  added  together — when  properly  aligned  in  time — ^to  produce  y{n).  More  explicitly, 
the  partial  sequences  are  defined  as 

y^(n)  =  ^  ^  ^Re(C2r,k)cos(2nkn/N)  -  Im{C2r,k)sir\{2nkn/!V)^  g(n  -  rA), 
r=0  fc=0 

No<n<2N-\  (53) 


and 


y2{n)  =  ^ 


r=0 


[f?e(C2r+i,<.)cos(27rfcn/iV)  -  /m(C2r+i,/b)  sin(27rfcn/A')j  g{n  -  rN), 

k=0 

Ao  <  n  <  2iV  -  1 


(54) 


where  Cr.jt  are  the  compressed  Gabor  coefficients.  Each  partial  sequence  is  now  windowed 
by  the  Hamming  function  as  follows: 


Mn)  = 

M-\ 

53  yiin)9H{n  - mN) 

Tn=0 

A/-I 

(55) 

min)  = 

Y.  min)gHin  -  mN) 

m=0 

(56) 
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Finally,  the  desired  approximation  of  y(n)  is  obtained  by 


y{n)  =  yi{n)  +  y2{n-N/2)  (57) 

Affine  Wavelet  Decomposition/ Reconstruction 

The  discrete  wavelet  transform  can  be  implemented  in  various  ways.  In  this  effort, 
the  straight  forward  inner  product  approach  is  used.  This  approach  simply  computes  the 
wavelet  coefficients  by  taking  inner  products  of  the  signal  with  translated  and  constricted 
versions  of  the  mother  wavelet.  The  signal  is  reconstructed  by  the  weighted  sum  of 
the  wavelets  at  all  shifts  and  resolutions.  The  weights  are  the  previously  computed 
coefficients. 

A  more  computationally  efficient  algorithm  was  developed  by  Mallat  [44].  In  that 
study,  the  original  wavelet  is  characterized  by  two  discrete  filters.  One  filter,  denoted  by  G, 
smooths  the  signal,  thereby  filtering  frequencies  higher  than  the  frequencies  of  the  current 
level  of  analysis.  The  second  filter,  denoted  by  H,  is  associated  with  the  wavelet  function 
and  is  derived  by  taking  the  desired  number  of  derivatives  of  the  smoothing  function. 
This  multi-stage  process  of  low-pass  filtering  followed  by  higher  order  derivative  filter 
correlations  has  been  used  in  the  past  in  spatial  image  [46]  analysis  and  has  also  recently 
been  used  to  develop  an  orthogonal  polynomial  transform  [47].  Although  less  efficient, 
the  more  direct  approach  is  embraced  here  for  educational  purposes  and  to  establish  a 
baseline  for  any  future  research. 

With  that  in  mind,  the  discrete  versions  of  Eqs  (15)  and  (16)  are  used.  In  general, 
these  equations  are  defined  as 

{Wy){k,2‘)  =  E  y(n)2'/V(2'n-fc)  (58) 

n=  — 00 

=  E  Y.  {Wy){k.2‘)2‘/^rf{2‘n-k)  (59) 

1=0  ib=-oo 
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where  l,k  G  Z.  The  parameter  2'  used  in  the  notation  for  the  wavelet  transform, 
{Wy){k,  2‘),  is  adopted  here  to  specify  the  log2  analysis  performed  by  the  transform. 

Since  in  practice  both  signal  and  wavelet  sequences  are  of  finite  length,  Eqs  (58) 
and  (59)  must  be  modified.  There  are  other  factors  to  consider  that  determine  the  final 
form  of  these  equations.  One  of  these  factors  is  the  mother  wavelet  to  be  used.  Two 
mother  wavelets  are  investigated  here:  the  Haar  wavelet,  defined  earlier,  and  the  Morlet 
wavelet  [51].  These  are  two  very  different  types  of  wavelet  The  sampled  Haar  wavelet  is 
a  real  valued  function,  which  is  compactly  supported  in  the  interval  0  <  n  <  N  —  \.  On 
the  other  hand,  the  sampled  Morlet  wavelet  is  a  complex  valued  function  centered  around 
n  =  0,  which  decays  to  zero  at  ±oo.  However,  for  practical  reasons,  the  sampled  Morlet 
wav  ’et  must  be  truncated.  These  mother  wavelets  are  defined  as 


Haar: 


Morlet: 


f 

1,  0<n<A72 

-  -1,  A72<n<iV-l  (60) 

0,  otherwise 

(exp  [—jujQn/N]  —  C)  exp  \—{n/N)^/2]  ,  —QN  <  n  <  QN  -  1 
<  ^  ■’  (61) 

0,  otherwise 


where 


uq  =  7r(2/ ln2)''^^ 
C  =  exp  [-(0^/2] 


(62) 

(63) 


The  plots  of  tiie  real  and  imaginary  portions  of  the  Morlet  wavelet  are  shown  in  Figure 
16.  As  can  be  seen  from  Eq  (61)  and  Figure  16,  the  mother  Morlet  wavelet  is  nearly  the 
Gabor  window  function  modulated  at  o/q.  One  can,  therefore,  think  of  the  Morlet  wavelet 
as  the  affine  counterpart  of  the  W-H  frame  Gabor  wavelet. 
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The  other  factors  that  remain  to  be  determined  are  the  number  of  samples  N,  and 
the  total  number  of  levels  L  necessary  to  cover  the  required  frequency  range  of  y{n) 
during  the  wavelet  analysis.  When  the  Haar  wavelet  is  used,  N  =  256  is  chosen.  This 
relates  to  an  analysis  at  level  /  =  0  of  approximately  31.3  Hz;  the  same  as  the  fundamental 
frequency  of  the  Fourier  and  Gabor  analyses.  Only  8  levels  (L  =  8)  of  analysis  are 
possible;  however,  these  are  sufficient  to  cover  the  frequency  range  of  y(n).  The  Haar 
wavelet  analysis  using  8  levels  produces  255  real  valued  coefficients  for  each  window  of 
N  points.  This  is  one  less  than  the  number  of  useful  coefficients  found  in  the  short-time 
Fourier  and  Gabor  decompositions  discussed  previously. 

Morlet  and  his  associates  [51:228]  suggested  that  a  complete  representation  of  a 
signal  can  be  computed  using  four  wavelets  per  octave.  The  completeness  of  the  set 
used  was  not  proven,  but  hypothesized  experimentally.  This  clearly  produces  many  more 
coefficients  than  in  the  radix  2  wavelet  analysis  proposed  here.  The  tradeoff  made  here  is, 
obviously,  fidelity  of  the  signal  versus  the  number  of  coefficients  used.  If  the  total  number 
of  coefficients  used  in  the  Morlet  case  are  going  to  approximate  the  number  of  coefficients 
used  in  the  other  analysis/synthesis  methods,  then  the  number  of  analysis  levels  have  to 
be  limited.  Consequently,  two  values  of  N  are  investigated  .  For  N  =  256,  the  wavelet 
analysis  is  computed  at  octaves  of  31.30  /o  Hz  «  26.58  Hz,  where  /o  =  u;o/27r.  In  this 
case,  8  levels  nearly  cover  the  frequency  range  of  y{n).  N  =  128  is  also  investigated. 
The  analysis  for  this  value  is  nearly  the  same  as  when  N  =  256;  however,  the  frequency 
analyzed  at  I  =  0  is  53.17  Hz.  Only  7  levels  of  analysis  are  possible  when  N  =  128. 
Table  1  summarizes  the  frequencies  analyzed-as  defined  by  the  parameter  N — at  the 
different  levels  of  the  wavelet  analysis.  All  values  in  the  table  are  in  Hz.  One  may 
wonder  why  /o  =  1  is  not  chosen  in  order  to  analyze  the  same  frequencies  as  in  the  Haar 
wavelet  analysis.  For  /o  =  1,  the  imaginary  portion  of  the  wavelet — at  the  highest  level 
of  resolution  (/  =  7)-would  be  sampled  at  the  zero  crossings.  Therefore,  at  this  level, 
errors  might  arise  since  the  imaginary  portion  of  the  analysis  would  always  be  zero.  This 
problem  does  not  arise  when  the  wavelet  is  defined  as  in  Eq  (61). 
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Table  1.  Frequencies  Analyzed  (in  Hz)  Using  Haar  and  Morlet  Wavelets 


Haar 

Morlet 

TV  =  256 

TV  =  256 

TV=  128 

31.30 

26.58 

53.17 

62.60 

53.17 

106.34 

Q 

125.20 

106.34 

212.67 

ill 

250.40 

212.67 

425.34 

500.80 

425.34 

850.68 

Ql 

1001.60 

850.68 

1701.36 

6 

2003.20 

1701.36 

3402.72 

7 

4006.40 

3402.72 

— 

All  the  necessary  information  is  now  available  to  define  the  wavelet  transform  pair. 
When  the  Haar  wavelet  is  used,  the  signal  y(n)  is  projected  onto  the  wavelet  basis  by 

{Wy){k,2‘)  =  E  y{n)2'l^r„{2'n  -  fciV),  0  <  fc  <  (2'  -  1)M  (64) 

n=0 

and  in  turn,  the  signal  y{n)  can  be  reconstructed  from  the  wavelet  coefficients  by 

y{n)  =  Y.  E  0<n<A^-l  (65) 

/=o  fc=0 

where  M  is,  again,  the  number  of  subsequences  of  length  N  that  describe  y(n).  In  the 
case  the  Morlet  wavelet  is  used,  the  wavelet  transform  is 

QN-\ 

{Wy){k,  2')  =  E  yi^)  2'^Vff (2'n  -  kN),  0  <  it  <  (2'  -  1  )M  (66) 

n=-QN 

and  the  inverse  transform  is 

y{n)  =  E  E  {Wy){k,2')2^I^M2'n  -  kN),  -QN  <n<QN-\  (67) 

/=o  fc=0 


These  equations  are  implemented  using  a  linked  binary  tree  data  structure  in  a 
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recursive  algorithm.  This  approach  best  fits  the  structure  of  the  equations  that  describe 
the  affine  transform  pair. 

These  transform  pairs  described  do  not  change  when  the  wavelet  coefficients 
{Wy){k,2^)  are  compressed  via  the  LIN.  Next  section  describes  how  the  LIN  truncates 
the  W-H  and  affine  frequency  spaces. 

Compression  Using  Lateral  Inhibition 

Before  describing  the  compression  process  performed  by  LINs,  it  is  instructive  to 
summarize  the  methods  of  compression  used  in  the  research  that  most  resembles  this 
effort.  Several  speech  compression  methods  using  short-time  Fourier  analysis  are  first 
presented.  Research  in  the  analysis  and  compression  of  speech  in  the  affine  wavelet  space 
is  quite  new  and  the  literature  sparse.  Also,  the  work  that  has  been  done  so  far  appears 
to  be  inconclusive.  However,  one  method  of  compression  investigated  by  Mallat  [45]  is 
very  closely  related  to  the  method  used  here  and,  therefore,  is  reviewed  in  this  section. 

Review  of  Fourier  Spectrum  Compression  Methods  McAulay  and  Quatieri  [48, 57] 
developed  a  method  of  reproducing  intelligible  speech  based  on  only  a  small  subset  of 
the  frequency  components  of  the  speech  signal.  In  their  research,  these  investigators 
decomposed  speech  using  the  short-time  Fourier  methods  similar  to  those  previously 
described.  Compression  was  acl  ved  by  picking  spectral  peaks  in  each  time  slice — 
usually  referred  to  as  a  frame  in  speech  processing — of  the  short-time  Fourier  spectrum 
(y(m,  k)).  The  peaks  were  chosen  by  a  combination  of  simple  peak-picking  and  a  handful 
of  rules  devised  to  match  peaks  from  frame  to  frame.  McAulay  and  Quatieri  [48]  reported 
that  anywhere  from  16  to  40  spectral  components  per  frame  were  sufficient  to  reproduce 
good  quality  speech.  How  they  rated  good  was  not  defined. 

McAulay  and  Quatieri ’s  research  inspired  several  research  efforts  at  the  Air  Force 
Institute  of  Technology  (AFTT)  in  speech  coding,  compression,  and  noise  reduction. 
Bashir  [4]  and  K.abrisky  et  al.  [38]  developed  a  system  to  improve  mutilated  speech  based 
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on  the  principles  developed  by  McAulay  and  Quatieri.  However,  the  spectral  peaks  were 
chosen  somewhat  differently.  The  glottal  frequency  was  determined  and  the  spectrum 
was  then  sampled  at  that  rate.  Rules  were  then  developed  to  search  on  either  side  of 
the  originally  chosen  peaks  to  determine  the  final  candidates.  A  rule  was  also  developed 
to  differentiate  between  voiced  and  unvoiced  speech.  That  system  was  reported  to  have 
“appreciably”  increased  the  the  quality  of  the  noisy  speech  [38]. 

In  another  AFTT  study,  Alenquer  [1]  investigated  that  same  peak-picking  algorithm 
for  speech  compression.  In  that  research,  it  was  thought  that  since  the  spectrum  of  speech 
rolls  off  at  about  6  dB/octave  above  625  Hz  or  so,  that  only  the  first  N  components  should 
suffice  to  reproduce  the  speech  signal.  Again  by  subjective  criteria,  =  16  was  found 
to  be  sufficient  to  reproduce  good  quality  speech  for  male  speakers.  The  system  was 
not  tested  on  female  speakers.  McMillan  [49]  has  modified  the  peak-picking  strategj^ 
further  by  first  designing  an  equalizing  filter  that  boosts  the  energy  of  the  higher  frequency 
components  to  match  those  of  the  lower  frequency  components.  The  spectrum  in  each 
frame  is  then  sampled  at  the  glottal  frequency — ^as  in  the  previous  two  investigations — 
however,  the  N  largest  components  are  chosen.  The  rules  to  search  for  the  final  frequency 
peaks  to  keep  were  also  expanded.  The  compressed  spectrum  is  passed  through  the  inverse 
of  the  equalizing  filter  before  the  speech  is  reconstructed.  No  voiced/unvoiced  decisions 
are  needed  with  that  system.  Currently,  seven  frequency  peaks  per  frame  appear  to  be 
sufficient  to  reproduce  good  quality  speech  of  either  male  or  female  speakers. 

The  common  theme  found  in  these  compression  algorithms  is  a  search  process 
designed  to  locate  the  dominant  frequency  peaks  of  the  spectrum.  All  other  frequency 
components  are  eliminated.  The  speech,  in  each  of  the  investigations  just  reviewed,  is 
reconstructed  by  summing  sinusoids  of  magnitudes  and  frequencies  corresponding  to  the 
chosen  frequency  peaks. 

Review  of  Affine  Wavelet  Spectrum  Compression  Methods  Mallat  [45]  has  proven 
that  under  certain  conditions  i  complete  reconstruction  of  a  signal  is  possible  from  only 
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the  local  extrema  of  the  affine  wavelet  frequency  space.  He  has  also  empirically  shown 
that  very  close  approximations  to  the  the  original  signal  can  be  produced  from  only  the 
relative  maxima  of  the  magnitude  of  the  wavelet  coefficients. 

The  process  begins  by  finding  the  local  extrema  of  the  wavelet  coefficients.  The 
eliminated  coefficients  are  then  approximated  by  cubic  spline  interpolations  and  the 
resulting  wavelet  space  sequences  are  projected  onto  a  convex  Sobolev  space.  The 
resulting  sequences  are  then  projected  back  out  to  the  signal  space,  and  again  into  the 
wavelet  space.  This  procedure  is  iterated  until  the  original  signal  is  obtained  to  the  desired 
accuracy.  Mallat  described  a  similar  process  in  the  case  the  relative  maxima  of  the  wavelet 
coefficients  are  chosen. 

LIN  Design  The  methods  used  to  compress  the  Fourier  spectral  characteristics  of 
speech  just  reviewed  have  relied  on  rule-based  systems  that  have  been  developed  in  an  ad 
hoc  manner.  Specific  knowledge  regarding  the  Fourier  spectral  characteristics  of  speech 
was  necessary  in  order  to  design  the  systems  based  on  Fourier  expansion.  As  is  shown  in 
this  section,  LINs  will  replace  the  rule  based  systems  for  compression  in  both  the  W-H 
frequency  space  and  the  affine  frequency  space.  In  contrast,  the  design  principles  of  the 
LINs  are  based  on  a  solid  mathematical  foundation  and  on  several  known  physiological 
phenomena  of  hearing. 

The  design  principles  used  in  this  thesis  are  meant  for  wider  application  than  just 
compressing  the  speech  spectrum.  The  choice  of  LINs  for  this  research  is  biologically 
motivated,  and  as  such,  specific  knowledge  about  how  the  hearing  mechanism  works  is 
used.  This  approach  differs  from  the  approach  of  the  previous  efforts,  since  the  emphasis 
is  not  placed  on  knowledge  about  source  signals  but  on  how  biological  systems  process 
these  signals.  Animals  across  species  use  the  basic  principles  adopted  here  in  nearly  all 
sensory  systems  to  preprocess  environmental  information.  These  design  principles  may, 
therefore,  be  applicable  to  coding  and  compressing  other  acoustical  and  electromagnetic 
signals  as  well  as  two-dimensional  spatial  images. 
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In  the  affine  wavelet  case,  Mallat’s  compression  and  reconstruction  methods  are 
based  on  a  more  solid  mathematical  foundation  than  the  other  compression  methods 
reviewed  here.  Nevertheless,  Mallat’s  iterative  process  does  not  appear  to  be  feasible 
for  real-time  processing.  As  shown  in  a  following  section,  the  local  maxima  of  the 
wavelet  spectrum  will  be  chosen  by  the  LIN  but  no  attempts  of  regenerating  the  eliminated 
coefficients  will  be  made.  One  of  the  main  reasons  for  this  is  to  compare  the  effects  of 
eliminating  coefficients  in  the  W-H  transform  space  versus  the  affine  wavelet  transform 
space  of  speech.  The  other  reason  for  not  using  Mallat’s  iterative  algorithm  is  for 
computational  reasons. 

UN  Compression  of  Short-Time  Fourier  and  Gabor  Spectra  To  recall  from 
Chapter  II,  LINs  are  found  in  the  subcortical  relay  stations  that  carry  patterns  of  electrical 
activity  from  the  auditory  nerve  to  the  primary  auditory  sensory  fields  of  the  cortex.  The 
main  function  of  the  LINs  in  that  system  is  presumed  to  be  contrast  enhancement  of 
spatio/temporal  information.  This  is  precisely  the  function  assigned  to  LINs  in  this  effort. 
The  magnitude  of  the  spectral  coefficients  of  speech  are  analogous  to  the  electrical  activity 
found  in  the  auditory  nerve.  The  dynamics  of  the  LINs  will  search  for  local  energy  peaks 
and  suppress  weaker  ones.  Nothing  is  assumed  about  the  incoming  signal.  The  only 
information  used  to  design  the  competitive  architecture  of  the  LINs  is  the  experimentally 
derived  filter  bandwidths  of  the  cochlea. 

The  LIN  design  for  W-H  frequency  compression  begins  with  the  numerical  or 
discrete  time  approximation  of  the  activity  equation  of  LIN  nodes  described  in  Eq  (18) 
in  Chapter  II.  Figure  8,  with  the  indexes  relabeled,  is  shown  again  in  Figure  17  for 
more  convenient  reference.  The  discrete  time  approximation  of  Eq  (18)  used  is  found  by 
Euler’s  method,  one  of  the  simplest  techniques  known  for  solving  differential  equations. 
As  long  as  the  inputs  to  the  LIN  are  normalized  or  scaled  between  zero  and  one,  Euler’s 
approximation  of  Eq  (18)  remains  stable  throughout  its  computation,  leading  to  steady- 
state.  with  reasonable  step  size  values.  Thus,  for  a  given  step  size  r,  the  activity  equation 
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Figure  17.  Lateral  Inhibition  Network  Model 


becomes 

o:k{t  +  1)  =  Xk{t)  +  r{{A  -  -  [B  +  x,{t)]h]  (68) 

where 


Ek  =  gk  +  f{xk) 

h  = 

Qk  =  |F(r,fc)|  or  |Cr,it| 


The  output,  Eq  (19),  is  modified  to  include  a  threshold  and  becomes 


fi^k)  = 


(Xk  -  0)^/l(Xk  -  +  Q^],  Xk>9 

0,  Xk  <  9 


(69) 
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The  variables  in  Eqs  (68)  and  (69)  are  chosen  as  A  =  I,  B  =  D  =  !,«  =  0.5,  and 
r  =  0. 1 .  In  addition,  the  result  of  varying  6  is  investigated  in  Chapter  IV  as  well. 

In  simple  terms,  Eq  (68)  says  to  increase  the  activity  Xk  of  node  Vk  by  a  maximum 
of  if  the  exciiatoiy  term  is  greater  than  the  inhibitory  term  (Ek  >  h),  or  decrease 
Xk  by  a  maximum  of  Bt  if  7*.  >  Ek.  In  other  words,  if  the  current  node’s  input  plus 
feedback  from  itself  is  greater  than  the  sum  of  the  outputs  of  the  competing  nodes,  then 
that  node’s  activity  will  increase  and  will  tend  suppress  the  activity  of  the  competing 
nodes.  Parameter  D  is  chosen  to  produce  winner-take-all  competition.  This  means  that 
the  node  with  the  highest  input  will  kill  off  the  activity  of  all  nodes  competing  with  it. 
In  this  way  the  spectral  coefficient  with  the  largest  magnitude  will  be  chosen  over  others 
within  a  defined  bandwidth. 

One  can  also  view  Eq  (68)  as  an  implementation  of  a  Mexican  hat  function  discrete 
filter.  Indeed,  the  plot  of  a  typical  implementation  of  the  transfer  function  (Hm)  of  this 
filter  resembles  a  Mexican  hat,  as  can  be  seen  from  Fig  (18).  This  particular  filter 
implementation  is  modeled  by  the  negative  of  the  second  derivative  of  the  Gaussian 
function,  or 

HM{k)  =  [1  -  {k/Nf]  exp[(fc/iV)V2]  (70) 

It  is  evident  from  Fig  (18)  where  the  term  on-center  off-surround  processing — the 
alternative  expression  for  lateral  inhibition — originated.  Frequency  components  in  the 
center  of  the  filter  are  magnified,  whereas  the  frequency  components  on  the  periphery 
of  the  filter  are  attenuated.  In  the  specific  case  of  the  LIN  used  in  this  thesis,  the  filter 
center  boosts  the  energy  of  only  one  spectral  component,  and  uniformly  attenuates  an 
equal  number  of  components  on  either  side  of  the  center.  These  filter  bandwidths  are  now 
defined. 

A  competitive  band  is  defined  to  be  a  set  of  competing  nodes.  Similarly,  a 
competitive  bandwidth,  denoted  as  Wc  €  Af,  is  the  number  of  nodes  in  a  competitive 
band.  The  competitive  bandwidth,  therefore,  has  a  frequency  bandwidth  W  associated 


Figure  18.  Typical  Mexican  Hat  Filter  Function 
with  it,  and  is  defined  by  the  following  relationship: 

W  =  3\.3Wcnz  (71) 

The  LIN  is  designed  to  have  overlapping  competitive  bands — or  overlapping  Mexican  hat 
filters.  The  competitive  bandwidths  of  these  overlapping  bands  are  chosen  to  approximate 
the  bandwidths  of  the  triangular  filter  models  of  the  cochlea.  As  one  may  recall  from 
Chapter  II,  the  bandwidths  of  these  overlapping  filters  increase  logarithmically  as  a 
function  of  frequency.  There  are  two  main  models  describing  these  bandwidths.  One  is 
called  the  equivalent  rectangular  bandwidth  (ERB)  [50];  the  other  is  called  the  critical 
bandwidth  (CB)  [70]. 

The  ERB  was  derived  to  have  the  following  function: 

WerbU)  =  6.23  f  +  93.39/  +  28.52  (72) 

The  function  ascribed  to  the  CB  is 

WcbU)  =  25  +  75(1  +  1.4/2)°-^  (73) 
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where  /  is  frequency  in  kHz.  A  plot  of  these  two  relationships  is  shown  in  Figure  19. 
This  plot  clearly  shows  that  the  ERBs  are  much  more  conservative  that  the  CBs,  especially 
for  frequencies  less  than  600  Hz.  A  LIN  designed  to  approximate  the  CB  model  will, 
therefore,  produce  more  compression. 


Figure  19.  ERB  and  CB  as  Functions  of  Frequency 

In  practice,  the  frequency  bandwidths  associated  with  the  competitive  bandwidths 
of  the  LIN  can  only  approximate  the  experimentally  derived  bandwidths  of  ihe  cochlear 
filters.  The  reason  for  this  is  that  each  LIN  node  is  chosen  to  compete  with  an  equal 
number  of  nodes  on  each  side  of  it  for  simplicity.  This  means  that  the  frequency 
bandwidths  of  the  LIN  cannot  increase  smoothly,  but  in  steps.  Table  2  summarizes  the 
competitive  bandwidths  Wc^,  and  the  associated  frequency  bandwidths  H4  of  the  LIN 
that  approximate  the  ERBs  and  CBs  of  the  cochlea.  The  column  labeled  ERB(CB)  CF 
shows  the  characteristic  frequencies,  CFs,  that  have  associated  ERBs(CBs)  equal  to  that 
of  the  frequency  bandwidths  Wi,  of  the  LIN.  In  other  words,  Werb(cb){CF)  =  Wk- 
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The  column  labeled  ERB(CB)  k,  shows  the  frequency  indexes  k  that  produce  the  analog 
frequencies  closest  to  the  ERB(CB)  CFs.  Figures  20  and  21  show  the  LIN’s  frequency 


Table  2.  Bandwidths  (in  Hz)  Analyzed  by  LIN 


m 

ERB 

CB 

CF 

mm 

CF 

mm 

k'k 

3 

93.9 

670 

21 

657.3 

— 

— 

— 

5 

156.5 

1264 

40 

1252.0 

947 

30 

939.0 

7 

219.1 

1820 

58 

1815.4 

1456 

47 

1471.1 

9 

281.7 

2344 

75 

2347.5 

1880 

60 

1878.0 

11 

344.3 

2842 

91 

2848.3 

2262 

72 

2253.6 

13 

406.9 

3318 

106 

3317.8 

2616 

84 

2629.2 

15 

469.5 

3773 

121 

3787.3 

2950 

94 

2942.2 

17 

532.1 

— 

— 

— 

3269 

104 

3255.2 

19 

594.7 

— 

— 

— 

3575 

114 

3568.2 

21 

657.3 

— 

— 

— 

3870 

124 

3881.2 

bandwidth  approximations  of  the  ERB  and  CB  curves,  respectively,  as  defined  by  Table 

2. 

To  summarize,  the  LIN  designed  to  compress  the  W-H  wavelet  spectrum  can  be 
thought  as  a  series  of  overlapping  Mexican  hat  filters.  The  bandwidths  of  these  filters 
are  chosen  to  approximate  the  putative  filter  bandwidths  of  the  cochlea,  which  increase  in 
bandwidth  logarithmically  as  a  function  of  the  CF.  The  function  of  these  filters  is  to  search 
for  energy  peaks  in  their  bandwidth  and  attenuate  all  others. 

Once  the  LIN  converges,  all  non-zero  output  nodes  are  used  as  pointers  to  the 
original  spectrum.  This  means  that  the  original  spectral  magnitudes  (of  tt^e  chosen 
coefficients)  are  used  to  resynthesize  the  signal,  and  not  the  values  of  the  outputs  of  me 
LIN  nodes. 


LIN  Compression  of  Affine  Wavelet  Spectrum  The  LIN  design  for  compress¬ 
ing  the  affine  wavelet  spectrum  is  based  on  more  speculative  criteria  than  what  was  used 
in  the  previous  section.  The  LIN  design  for  the  W-H  wavelet  spectrum  is  based  on  an 
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Figure  20.  LIN  Frequency  Bandwidth  and  ERB  vs  Frequency 

attempt  to  approximate  filter  bandwidths  found  in  the  cochlea.  These  filter  bandwidths  are 
nearly  constant  for  CFs  less  than  1  kHz  or  so,  and  then  increase  logarithmically  for  CFs 
above  1  kHz.  The  resultant  W-H  transform/LIN  process  is  a  forced  logarithmic  frequency 
analysis  of  the  input  signal.  This  analysis  highlights  the  frequency  peaks  or  changes  in 
each  time  slice,  or  frame.  The  match  between  each  frame  of  the  W-H  sp)ectrum  and 
the  LIN  is  straightforward,  since  both  are  one-dimensional  representations  of  frequency 
or  frequency  operators.  In  contrast,  the  affine  wavelet  transform  performs  a  logarithmic 
analysis  of  the  input  signal  to  begin  with.  Each  affine  wavelet  frame — defined  in  this 
case  as  the  interval  of  time  of  the  lowest  level  analysis — is  both  a  time  and  frequency 
representation. 

The  direct  counterpart  of  the  previous  design,  is  to  highlight  the  temporal  changes 
or  peaks  in  each  octave  band  or  analysis  level.  TTie  LIN  design  for  compressing  the 
affine  spectrum  at  each  analysis  level  is  identical  to  the  LIN  design  for  compressing  the 
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Figure  21.  LIN  Frequency  Bandwidth  and  CB  vs  Frequency 

W-H  spectrum;  however,  the  competitive  bands  remain  constant  for  any  given  level.  A 
competitive  band  in  the  present  case  now  corresponds  to  a  competitive  interval,  which 
is  denoted  as  Tc  €  AA.  The  relationship  between  the  competitive  interval  and  the  true 
interval  associated  with  it  in  this  effort  is 


Ti 


3\. 95  Tc 
- - msec. 


(74) 


where  I  is  the  level  of  analysis,  and  the  time  constant  31.95  msec,  was  found  in  the 
beginning  of  this  chapter. 

The  competitive  interval  for  each  level  is  determined  by  the  reciprocal  of  comp)etitive 
bandwidth  for  the  frequencies  analyzed  by  each  level  of  analysis.  In  this  case,  only  the 
CB  rule  is  used.  Table  3  lists  the  intervals  over  which  the  LIN  is  compressing  at  each 
level.  These  approximate  the  intervals  determined  by  the  CB  rule,  which  are  denoted  by 
Tcb  and  are  also  listed  in  Table  3. 
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Table  3.  Intervals  (in  msec.)  Analyzed  by  LIN 


1 

Tc 

T, 

Tcb 

0 

0 

0 

9.99 

1 

0 

0 

9.97 

2 

0 

0 

9.89 

3 

3 

11.98 

9.57 

4 

5 

9.98 

8.52 

5 

5 

4.99 

6.15 

6 

7 

3.49 

3.32 

7 

7 

1.75 

1.45 

Table  3  shows  that  no  compression  is  possible  in  the  first  three  levels  of  analysis 
using  the  CB  tuned  LIN  method  of  compression.  This  suggests  that  the  affine  decomposed 
speech  will  not  be  compressed  to  the  levels  that  the  W-H  decomposed  speech  is  capable 
of  being  compressed  with  LINs. 

Summary 

This  chapter  defined  the  decomposition,  compression,  and  reconstruction  algo¬ 
rithms  used  in  this  thesis.  The  chapter  began  with  a  description  of  the  software  and 
hardware  environments  along  with  some  basic  definitions  and  notation.  The  short-time 
Fourier,  Gabor,  and  affine  wavelet  transforms,  along  with  the  varied  window  or  wavelet 
functions  used,  were  then  precisely  defined.  Finally,  the  LIN  designs  for  compressing 
each  of  the  transform  derived  spectra  were  presented.  Briefly,  the  LINs’  overlapping 
bandwidths(intervals)  were  designed  to  approximate  the  bandwidths(intervals)  of  the 
theoretical  overlapping  filter  model  of  the  cochlea.  The  following  chapter  summarizes  the 
results  found  in  regenerating  speech  signals  from  the  compressed  spectra  obtained  from 
the  methods  described  in  this  chapter. 
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IV.  Results 


Introduction 

This  chapter  summarizes  the  results  obtained  from  decomposing,  compressing,  and 
regenerating  the  spoken  integers  by  the  methods  described  in  the  previous  chapter.  For 
economy,  the  results  are  shown  only  for  the  word  seven  spoken  by  the  author  unless 
otherwise  noted.  These  results  are  representative  of  the  results  obtained  from  the  entire 
vocabulary. 

This  chapter  is  organized  into  four  main  sections.  In  the  first  section,  the  results 
obtained  from  compressing  the  short-time  Fourier  and  Gabor  spectra  are  discussed.  These 
results  highlight  the  differences  obtained  from  using  the  various  window  functions  and 
various  LIN  configurations.  The  second  section  summarizes  the  results  obtained  from 
compressing  the  affine  wavelet  spectrum.  The  third  section  discusses  the  results  obtained 
from  the  noise  filtering  tests.  The  final  section  compares  the  results  summarized  in 
previous  sections. 

The  criteria  used  here  to  establish  the  best  results  are  mean  square  error  (MSE) 
between  the  original  time  signal  and  the  reconstructed  signal  (both  signals  are  amplitude 
normalized  before  the  MSE  is  computed),  and  subjective  listening  tests.  The  subjective 
tests  consist  simply  of  questions  regarding  the  intelligibility  and  quality  of  the  reconstructed 
speech  judged  by  randomly  chosen  listeners.  Since  this  research  is  only  a  preliminary 
study  in  the  LIN  compressing  capabilities  of  a  variety  of  spectra,  such  informal  testing 
seems  reasonable.  Any  subsequent  research  based  on  this  thesis  that  looks  to  optimize  the 
intelligibility  and  communication  bit  rates  of  speech  should  use  more  rigorous  listening 
tests  such  as  the  diagnostic  rhyme  test. 
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Results  Based  on  Short-Time  Fourier  and  Gabor  Spectra 

The  results  reported  in  this  section  are  organized  by  first  comparing  the  original 
discrete  signal  to  the  reconstructed  signal  from  all  the  spectral  coefficients  produced 
from  the  short-time  Fourier  and  Gabor  decompositions  defined  in  the  previous  chapter. 
The  results  of  compressing  the  varied  spectral  representations  with  the  LIN  are  then 
examined.  These  results  show  how  the  spectral  components  are  eliminated  as  the 
competitive  bandwidths  are  increased  and  the  LIN  output  threshold  is  increased.  Finally, 
the  reconstructed  signals  from  their  compressed  spectral  representations  are  compared  and 
ranked  by  MSE  and  auditory  quality. 

The  plots  of  the  original  and  the  reconstructed  signal  sequences  of  the  word  seven, 
from  the  various  short-time  Fourier  and  Gabor  representations,  are  shown  in  Figures  22 
through  32.  Immediately  after  each  of  these  plots,  the  difference  signal  between  the 
original  and  reconstructed  signals  are  shown,  rescaled,  in  order  to  better  demonstrate  the 
resultant  error.  All  signals  are  linearly  scaled  between  minus  one  and  one  in  order  to 
eliminate  irrelevant  scale  variations. 

The  plot  titles  describe  the  window  and  the  method  of  decomposition  used. 
Spectrogram  in  these  titles  refer  to  the  short-time  Fourier  decomposition,  and  Gabor  is 
self  explanatory.  For  example,  the  plot  labeled  Seven  Rectangular  Spectrogram  describes 
the  resynthesized  word  seven  derived  from  the  short-time  Fourier  spectrum  using  a 
rectangular  window.  The  plot  entitled  Seven  Original,  naturally,  is  the  plot  of  the  original 
signal.  The  word  seven  is  omitted  from  the  title  of  the  plots  illustrated  in  Figures  31  and 
beyond,  with  the  understanding  that  the  results  shown  are  derived  from  this  spoken  word. 
All  signal  sequences  in  this  section  are  discretized,  of  course,  but  are  plotted  with  points 
joined  and  give  the  illusion  of  continuous  time  signals. 

As  may  be  recalled  from  the  previous  chapter,  the  interval  of  support  of  the 
biorthogonal  function  is  determined  by  parameter  Q  in  Eqs  (40)  and  (52).  That  interval  of 
support  is  defined  by  the  inequality  -QN  <n<  QN,  where  N  is  the  number  of  points  in 
each  window.  By  definition,  Q  =  1  for  the  exponential  biorthogonal  function.  In  the  case 


60 


of  the  Gaussian  biorthogonal  function  ,  Q  <  M  [19].  Two  values  of  Q  were  chosen  in 
the  Gaussian  case:  Q  =  M,  for  the  maximum  support,  and  Q  =  2,  for  minimum  support. 
The  plots  labeled  Gauss  Gabor  are  of  reconstructed  signals  from  the  Gabor  representation 
computed  using  Q  =  M.  Likewise,  the  plots  labeled  Short  Gaussian  Gabor  are  associated 
with  Q  =  2.  This  nomenclature  will  be  adopted  hereafter  for  all  references  to  the  Gabor 
representations  computed  using  the  two  values  of  Q. 

The  plots  in  Figs  22  through  25  reveal  that  the  errors  produced  from  reconstructing 
the  signal  from  the  short-time  Fourier  spectrum  using  rectangular  or  Hamming  windows 
are  very  small  and  nearly  identical.  Taking  floating  point  and  other  arithmetic  errors 
into  account,  these  small  errors  are  to  be  expected  from  the  theory  described  in  the 
previous  chapter.  The  result  of  using  the  compactly  supported  Gaussian  window  produces 
a  larger  error  in  the  reconstructed  signal,  and  again  this  is  to  be  expected  from  the 
theory.  Numerous  subjects  listened  to  the  original  and  reconstructed  signals  and  could  not 
distinguish  between  them. 

As  seen  from  Figures  28  through  33,  the  errors  produced  from  the  Gabor  representa¬ 
tions  are  much  greater  and  different  in  nature.  Figure  28  illustrates  the  reconstructed  signal 
from  the  Gaussian  Gabor  spectrum,  that  is,  using  the  maximally  supported  biorthogonal 
function  of  the  Gaussian  window.  The  first  and  last  few  time  frames  were  omitted  in 
the  reconstruction  process  in  order  to  avoid  out  of  range  indexing  in  the  data  structure 
used  to  hold  the  values  of  the  reconstructed  signal.  Omitting  these  time  frames  created 
two  unexpected  spikes  at  the  beginning  and  end  of  the  reconstructed  signal.  These  errors 
foreshadow  the  errors  that  occur  when  the  Gabor  spectrum  is  compressed.  Figure  29 
illustrates  the  error  signal  produced  from  the  signals  plotted  in  Figure  28.  The  two 
large  spikes  in  the  reconstructed  signal  were  removed  prior  to  computing  that  difference. 
Besides  the  31.3  Hz  sinusoidal  error  observed  in  Figure  28,  a  handful  of  impulses  are  also 
seen.  Although  these  errors  are  much  greater  than  the  errors  produced  by  the  short-time 
Fourier  reconstructions,  they  were  not  perceived  by  the  listening  subjects. 

As  shown  in  Figure  31,  the  reconstructed  signal  from  the  short  Gaussian  Gabor 
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spectrum  has  large  periodic  spikes.  These  spikes  were  clearly  audible  in  the  listening 
tests.  Although  intermediate  values  of  Q  were  not  tested,  it  is  clear  that  subjective  quality 
of  the  reconstructed  signal  is  a  function  of  the  parameter  Q.  This  is  also  verified  by  the 
MSE  of  these  signals  shown  in  Table  4. 

It  appears  that  the  spikes  that  are  manifested  in  the  reconstruction  process  leak 
through  when  the  Gaussian  and  its  biorthogonal  function  do  not  entirely  cancel  each  other 
out.  The  large  spikes,  shown  in  the  lower  plot  in  Figure  28,  that  occurred  when  sections 
of  the  Gabor  coefficients  were  removed  support  this  conjecture. 

Figures  32  and  33  illustrate  the  result  of  reconstructing  the  speech  signal  from 
the  Gabor  representation  using  an  exponential  window.  Figure  33  shows  that  periodic 
(31.3  Hz)  spiky  errors  are  also  produced  in  this  reconstructed  signal.  The  arrows  point 
to  the  most  obvious  ones.  Also  notice  that  the  overall  error  tends  to  follow  a  decaying 
exponential  function;  the  same  as  the  window  function.  The  explanation  for  these  errors 
is  different  from  the  one  offered  in  the  Gaussian  case.  This  explanation  is  given  without 
proof  with  the  following  simple  example.  Suppose  a  modulated  one  sided  exponential 
function  is  defined  with  the  same  time  constant  as  the  exponential  window  used  in  a 
Gabor  expansion.  Furthermore,  suppose  that  this  function  is  exactly  shifted  in  time  by  an 
integer  multiple  of  a  =  At  and  modulated  by  an  integer  multiple  of  /?  =  A/.  A  Gabor 
decomposition  of  this  signal  will  result  in  a  single  impulse  in  the  two-dimensional  Gabor 
lattice.  This  impulse  results  from  the  perfect  match  between  the  signal  and  the  window 
shifted  and  modulated  at  precisely  the  same  values  as  the  signal.  Figure  34  precisely 
illustrates  this  phenomenon.  The  original  signal  plotted  on  the  top  left  hand  graph  is 
defined  as 

y{t)  =  exp[-(t  -  to)/'r]  .'in(27r/oO  u{t  -  to)  (75) 

where 


=  3 


^0 

/o 
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11 


Seven  Rectangular  Spectrogram  Seven  Original 


Figure  22.  Original  Signal  and  Reconstructed  Signal  from  Spectrogram  Using  a  Rectan¬ 
gular  Window 
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Difference  Original/Rectangular  Spectrogram 


Figure  23.  Difference  Signal  Between  Original  and  Reconstructed  from  Spectrogram 
Using  a  Rectangular  Window 
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Seven  Hamming  Spectrogram  Seven  Original 


Figure  24.  Original  Signal  and  Reconstructed  Signal  from  Spectrogram  Using  a  Ham 
ming  Window 
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Difference  Original/Hamming  Spectrogram 


Figure  25.  Difference  Signal  Between  Original  and  Reconstructed  ffoin  Spectrogram 
Using  a  Hamming  Window 
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Figure  26.  Original  Signal  and  Reconstructed  Signal  from  Spectrogram  Using  a  Gaussian 


Window 
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Difference  Original/Hamming  Spectrogram 
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Seven  Gaussian  Gabor  Seven  Original 
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Difference  Original/Gaussian  Gabor 


Figure  29.  Difference  Signal  Between  Original  and  Reconstructed  from  Gaussian  Gabor 
Spectrum 
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Figure  30.  Original  Signal  and  Reconstructed  Signal  from  Short  Gaussian  Gabor 
Spectrum 
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Difference  Original/Short  Gaussian  Gabor 


Figure  31.  Difference  Signal  Between  Original  and  Reconstructed  from  Short  Gaussian 
Gabor  Spectrum 
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fo  =  11 


r  =  1 

The  units  are  arbitrary.  This  signal  is  sampled  at  \  fT  =  64  and  =  64.  The  three 
dimensional  plot  illustrates  the  Gabor  decomposition  of  the  signal  and  shows  the  resulting 
impulse  centered  on  the  time  axis  at  m  =  3,  and  centered  on  the  frequency  axis  at 
k  =  11 — corresponding  exactly  with  to  and  fo  (the  indexes  in  the  Gabor  lattice  plots 
are  off  by  one  due  to  the  plotting  software).  The  bottom  left  hand  graph  shows  perfect 
reconstruction  as  expected. 

If,  however,  the  signal  is  not  shifted  and  modulated  at  precise  integer  multiples 
of  time  and  frequency,  respectively,  the  resulting  Gabor  representation  is  quite  different 
as  shown  in  Figure  35.  The  original  signal  is  shifted  and  modulated  by  to  =  3.3  and 
/o  =  11.2  in  this  case.  The  Gabor  lattice  now  shows  ringing  in  both  the  time  and 
frequency  axes.  This  ringing  is  due  to  the  uncertainty  produced  in  trying  to  pinpoint 
the  signal  in  time  and  frequency.  Indeed,  the  Gabor  representation  can  be  thought  as  a 
lA'o-dimencional  probability  density  function.  From  the  Gabor  coefficients  shown  plotted 
in  Figure  35,  the  reconstructed  signal  displays  the  uncertainty  produced  in  the  expansion 
of  the  original  signal.  Notice  that  a  small  error  occurs  at  n  =  192  (fo  =  3)  as  well  as  at 
subsequent  integer  multiples  of  n  =  64  (f).  This  same  phenomenon  is  exhibited  in  the 
error  signal  shown  in  Figure  33.  This  is  to  be  expected  since  naturally  occurring  speech 
signals  are  not  expected  to  have  temporal,  frequency,  and  envelope  characteristics  that 
perfectly  match  the  window  function. 

The  types  of  errors  shown  in  Figure  33  are  audibly  experienced  as  clicks.  These 
clicks  are  more  pronounced  in  some  words  than  in  others.  All  hearing  test  subjects 
reported  the  presence  of  the  clicks  in  the  reconstructed  signals.  However,  all  subjects 
reported  that  these  errors  were  not  as  obtrusive  as  the  errors  experienced  in  the  regenerated 
signals  from  the  short  Gaussian  Gabor  spectrum.  Since  audible  errors  resulted  from  using 
the  exponential  window,  no  compression/reconstruction  tests  were  performed  based  on 
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these  decompositions.  The  minimally  supported  biorthogonal  Gaussian  function,  on  the 
other  hand,  is  important  when  reconstructing  the  speech  signals  from  a  compressed  Gabor 
representation.  This  will  be  examined  later  in  this  section. 

Table  4  lists  the  MSEs  of  all  regenerated  signals  described  thus  far.  Subjectively, 
the  results  can  be  ranked  nearly  the  same  as  the  MSE  ranking.  The  only  difference  is  that 
the  first  four  entries  in  the  table  can  be  subjectively  ranked  with  equal  weight,  and  each  of 
these  are  indistinguishable  from  the  original  signal. 

Table  4.  MSE  of  Reconstructed  Signals 


Reconstruction  Source 

MSE 

Rectangular  Spectrogram 

Hamming  Spectrogram 

Gaussian  Spectrogram 

Gaussian  Gabor 

Exponential  Gabor 

Short  Gaussian  Gabor 

irfrfculHMiii 

The  next  series  of  plots  in  Figures  36  through  43  show  frame  9  (n  =  2304)  of  the 
short-time  Fourier  spectra  and  the  Gabor  spectra  for  each  window  used.  Frame  9  of  the 
signal  corresponds  to  the  first  vowel  e  {e,  from  the  International  Phonetic  Alphabet)  in  the 
word  seven.  Figure  36  demonstrates  that  the  spectrum  obtained  from  a  Hamming  window 
is  nearly  the  same  as  the  one  obtained  from  the  Gaussian  window.  One  would  expect  that 
from  the  similarity  of  the  two  window  functions.  On  the  other  hand  the  Gabor  spectra  are 
quite  different,  as  shown  in  Figure  37.  The  Gaussian  window  in  this  case  produces  much 
more  energy  below  150  Hz  or  so.  In  addition,  the  Gaussian  Gabor  decomposition  tends 
to  attenuate  the  energy  of  the  higher  formants.  Unexpectedly,  the  one-sided  exponential 
window  produces  a  Gabor  spectrum  much  closer  to  the  short-time  Fourier  spectrum. 
Again,  these  examples  are  representative  of  the  effects  observed  in  the  entire  vocabulary. 

Figures  38  through  43  show  how  the  LIN  eliminates  spectral  components  as  a 
function  of  competitive  bandwidth  rule  and  threshold.  The  spectral  lines  shown  in  these 
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Seven  Exponential  Gabor  Seven  Original 


Figure  32.  Original  Signal  and  Reconstructed  Signal  from  Gabor  Spectrum  Using  an 
Exponential  Window 
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Difference  Original/Exponential  Gabor 


Figure  33.  Difference  Signal  Between  Original  and  Reconstructed  from  Gabor  Spectrum 
Using  an  Exponential  Window 
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Figure  34.  Exponential  Gabor  Decomposition/Reconstructioi;  Example  of  Aligned  Sig¬ 
nal  in  Time  and  Frequency 
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ime  (n/64  units) 


Figure  35.  Exponential  Gabor  Decomposition/Reconstitiction  Example  of  Unaligned 
Signal  in  Time  and  Frequency 
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Figure  36.  Frame  9  of  Spectrogram  for  Each  Window 
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Figure  37.  Frame  9  of  Gabor  Spectrum  for  Each  Window 
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figures  represent  the  normalized  output  of  the  LIN  after  convergence.  One  may  recall  that 
the  competitive  bandwidths  used  to  approximate  the  ERB  are  more  conservative  than  the 
CB  (see  Figure  19).  As  a  result  the  CB  rule  produces  more  compression  than  the  ERB 
rule.  The  LIN  output  function  thresholds  used  were  6  =  .0009,  .09,  and  are  referred  to  as 
the  low  threshold  and  high  threshold  in  Figures  38  through  43.  The  high  threshold  in  all 
compressions  tends  to  eliminate  the  higher  frequency  components — above  1500  Hz — due 
to  their  low  energy.  This  is  evident  in  Figures  40  and  43.  Although  it  may  be  difficult  to 
determine  from  the  figures,  the  LIN  consistently  finds  the  local  maxima  as  defined  by  the 
competitive  bandwidths.  These  local  maxima  correspond  the  the  glottal  frequency  (the 
first  maximum  chosen)  and  its  harmonics,  including  the  the  formant  peaks. 

Before  describing  the  results  of  regenerating  the  signal  from  the  compressed  spectra, 
the  time-aliasing  effects  described  in  the  previous  chapter  are  shown  in  Figures  44  and 
45  for  two  cases.  Figure  44  shows  the  plot  of  the  reconstructed  and  difference  signal 
from  the  compressed  Fourier  spectrum  using  a  Hamming  window,  and,  similarly,  the  plots 
shown  in  Figure  45  are  derived  from  the  compressed  Gaussian  Gabor  Spectrum.  In  th^ 
former  case,  the  errors  can  be  described  as  large  amplitude  errors  concentrated  around  t! ' 
window  edges,  whereas  in  latter  case,  the  errors  can  be  described  as  spiky  noise.  The 
spiky  noise  in  the  Gabor  case  was  predicted  earlier  in  the  discussion  pertaining  to  Figure 
28.  Again,  the  conjecture  is  that  the  spikes  of  the  biorthogonal  function  seem  to  leak 
through  if  the  window  function  does  not  entirely  cancel  them  out. 

The  windowing  process  described  in  Eq  (36)  completely  eliminates  the  time¬ 
aliasing  effects  in  the  case  of  the  short-time  Fourier  spectra  Consequently,  the  best  results 
were  obtained — both  subjectively  and  in  terms  of  MSE — from  these  representations.  The 
compressed  Gabor  spectra  did  not  reproduce  signals  as  well  the  Fourier  spectra,  even 
with  windowing.  However,  all  signal  reproductions  revealed  that  a  truncated  biorthogonal 
function  should  be  used  when  compressing  the  Gabor  spectrum.  This  is  illustrated  in 
Figures  (46)  and  (47).  The  spike  noise  cannot  be  fully  eliminated  when  a  maximally 
supported  biorthogonal  function  is  used.  With  a  truncated  biorthogonal  function,  spikes 
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Figure  40.  Compression  of  Spectrogram  Using  CB  Rule  and  High  Threshold 
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Frame  9  Compressed  Short  Gaussian  Gabor  Frame  9  Compressed  Gaussian  Gabor 


Figure  41.  Compression  of  Gabor  Spectrum  Using  ERB  Rule  and  Low  Threshold 
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Frame  9  Compressed  Short  Gaussian  Gabor  Frame  9  Compressed  Gaussian  Gabor 


Figure  42.  Compression  of  Gabor  Spectrum  Using  CB  Rule  and  Low  Threshold 
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Frame  9  Compressed  Short  Gaussian  Gabor  Frame  9  Compressed  Gaussian  Gabor 


Difference  Original/Compressed  Compressed  Hamming  Spectrogram 


Difference  Original/Compressed  Gaussian  Gabor  Compressed  Gaussian  Gabor 


Figure  45.  Aliasing  Effects  Obtained  from  Compressed  Gabor  Spectrum  Using  a  Gaus¬ 
sian  Window 
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do  not  appear,  however,  the  quality  of  the  reconstructed  speech  can  be  characterized  as 
rough  sounding. 

As  previously  shown,  the  amount  of  compression  can  be  controlled  by  either 
increasing  the  competitive  bandwidths  or  increasing  the  output  threshold  of  the  LIN.  The 
reconstructed  speech  obtained  from  the  compressed  spectra  using  a  LIN  tuned  to  the 
ERB  rule  was  virtually  indistinguishable  (subjectively)  from  the  speech  obtained  from 
the  compressed  spectra  using  the  CB  rule.  This  is  true  only  for  a  low  output  threshold. 
Increasing  the  LIN  output  threshold  tends  to  degrade  the  fricatives.  However,  this  is 
preferable  to  increasing  the  competitive  bandwiuths  much  beyond  the  CB  rule.  Speech 
compressed  using  the  LIN  tuned  to  twice  the  CB  rule  produced  markedly  deteriorated 
reconstructed  speech.  This  result  supports  the  argument  that  CB  rule  better  describes  the 
filter  bandwidth  characteristics  of  the  cochlea  than  the  ERB  rule. 

The  best  results,  overal’,  were  produced  from  the  short-time  Fourier  spectrum  using 
a  compactly  supponed  Gaussian  window.  Also,  unexpectedly,  better  compression  resulted 
from  this  representation  than  any  other  representation  compressed  with  an  identically  tuned 
LIN.  Table  5  lists  the  compression  ratios  obtained  using  a  LIN  tuned  to  the  CB  and  twice 
the  CB  rule  with  two  values  of  output  threshold  for  each  window  used.  The  compression 
ratio  is  defined  as 

Rc  =  Nt/N,  (76) 

where  is  the  total  number  of  coefficients,  and  Nc  are  the  total  number  of  coefficients 
left  after  compression. 

Speech  compressed  using  the  LIN  tuned  to  the  CB  rule  produced  clearly  intelligible 
speech  when  resynthesized,  even  when  a  high  LIN  output  threshold  was  used.  Table  6 
lists  the  MSEs  obtained  from  the  respective  compressed  spectra  using  the  CB  rule  and 
an  output  threshold  0  =  .09.  As  before,  the  subjective  ranking  coincides  with  the  MSE 
ranking. 

Although  the  reconstructed  speech  from  the  compressed  short-time  Fourier  spectrum 
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Table  5.  Compression  Ratios  Obtained  from  Short-Time  Fourier  Spectrum 


Compression  Ratios  (Re) 

LIN  Tuning 

Window 

Rule 

Threshold 

Rectangular 

Hamming 

Gaussian 

CB 

.0009 

8.84 

11.23 

11.77 

CB 

.0900 

18.08 

23.55 

24.72 

2xCB 

.0009 

14.95 

18.54 

19.40 

Table  6.  MSE  of  Reconstructed  Signals  from  Compressed  Spectra 


Reconstruction  Source 

MSE 

Gaussian  Spectrogram 

Hamming  Spectrogram 

■KiTdaiiriBii 

Rectangular  Spectrogram 

Short  Gaussian  Gabor 

Gaussian  Gabor 

was  intelligible,  it  can  not  be  claimed  that  this  speech  is  of  toll  quality.  In  tests  performed 
on  continuous  speech,  musical  quality  tones  are  perceived  around  some  fricatives  in  the 
reconstructed  speech.  The  phrase  processed  in  these  tests  was  presented  in  Chapter  I, 
Section  Scope. 

This  section  concludes  with  an  estimate  of  the  bit  rate  achieved  by  compressing 
the  Gaussian  window  derived  Fourier  spectrum  of  the  test  phrase.  No  source  or  channel 
coding  is  assumed  in  the  bit  rate  estimate.  The  LIN  was  tuned  with  the  CB  rule  and  an 
output  threshold  of  .09.  A  compression  ratio  of  26.81  was  achieved  in  this  case.  This 
translates  to  an  average  of  4.774  coefficients  per  frame.  Since  the  frames  are  overlapped 
by  50%,  an  average  of  approximately  10  coefficients  per  31.95  msec  was  sufficient  to 
represent  the  signal.  Approximately  31  frames  are  needed  for  each  second  of  speech. 
In  order  to  find  the  bit  rate,  the  total  number  of  bits  that  represent  the  frequency  index, 
magnitude,  and  phase  of  each  coefficient  is  established.  The  total  number  of  bits  needed 
for  the  frequency  index  is  fixed  at  7.  The  number  of  bits  used  to  represent  the  magnitude 
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and  phase  were  decreased  to  4  and  2  respectively;  down  from  16  bits  each.  This  reduction 
in  the  number  of  bits  did  not  significantly  degrade  the  speech  signals  (neither  in  MSE  nor 
subjectively).  Thus,  the  total  number  of  bits  necessary  to  represent  each  coefficient  is  13, 
times  10  coefficients  per  frame,  times  31  frames  per  second  yields  4030  bits  per  sec.  This 
compares  favorably  with  the  4800  bit  rate  speech  demonstrated  by  McMillan  [49]. 

Results  Based  on  Affine  Wavelet  Spectra 

As  in  the  previous  section,  the  resynthesized  signal  from  all  the  coefficients  of  the 
affine  wavelet  spectrum — using  the  Haar  and  the  Morlet  wavelets — are  first  examined. 
Afterwards,  the  result  of  compressing  the  affine  wavelet  spectrum  via  the  LIN,  as  defined 
in  the  previous  chapter,  is  presented. 

The  regenerated  signal  from  the  Haar  wavelet  obtained  coefficients  is  nearly  identical 
to  the  original  signal,  as  shown  in  Figure  48.  The  error  obtained  from  this  reconstruction 
is  almost  the  same  as  the  error  obtained  from  the  signal  resynthesized  from  the  short-time 
Fourier  spectrum  using  the  rectangular  or  Hamming  windows.  The  high  fidelity  of  this 
reconstructed  signal  is  to  be  expected  due  to  the  completeness  of  the  Haar  wavelet  set. 

In  the  previous  chapter,  it  was  mentioned  that  four  times  as  many  Morlet  wavelet 
levels  than  are  used  in  this  effort  are  necessary  for  a  complete  representation.  As  a  result, 
a  large  error  between  the  original  and  the  reconstructed  signal  from  the  Morlet  wavelet 
decomposition  is  to  be  expected.  Figure  49  verifies  this.  However,  what  was  not  expected 
is  that  the  subjective  quality  of  this  reconstructed  signal  is  quite  good,  although  different 
from  the  original  and  somewhat  noisier. 

The  Gaussian  envelope  of  the  Morlet  wavelet-exp[— (n/A^)^/2],  as  presented  in 
Eq  (61) — has  a  variance  =  1  if  one  defines  the  general  form  of  the  Gaussian  as 
exp[-t^/(2cT^)].  In  this  form,  the  Morlet  wavelet  produces  quite  a  large  overlap  during 
the  computation  of  the  wavelet  transform.  The  variance  was  reduced  to  =  l/(2:r) — in 
order  to  reduce  the  overlap — and  the  signals  were  decomposed  and  resynthesized  with 
the  new  wavelet  set  The  result,  represented  in  Figure  50,  was  a  reduction  in  the  MSE 
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between  the  original  and  reconstructed  signals.  The  listening  test  subjects  reported  that 
these  resynthesized  signals  sounded  closer  to  the  original  than  the  signals  resynthesized 
from  the  wider  variance  Morlet  wavelet  decompositions.  Table  7  lists  the  MSEs  of  the 
three  regenerated  signals  described  in  this  section. 

Table  7.  MSE  of  Reconstructed  Signals  From  Wavelet  Coefficients 


Reconstruction  Source 

MSE 

Haar  Spectrum 

Morlet  Spectrum 

■tWMBnaai 

Narrow  Morlet  Spectrum 

The  reconstructed  speech  from  the  compressed  affine  spectrum — by  the  method 
described  in  Chapter  II — was  intelligible  but  of  poor  quality,  regardless  of  the  wavelet 
used.  This  reconstructed  speech  has  a  very  rough  quality  associated  with  it.  Even  when 
reducing  the  compression  by  allowing  only  nearest  neighbor  competition  in  the  LIN,  the 
resultant  reconstructed  speech  is  rough  and  muffled.  Figures  51  through  53  show  the  result 
of  compressing  all  eight  Haar  generated  coefficiciit  levels  of  frame  9  of  the  speech  signal 
using  the  CB  rule.  At  each  level,  the  coefficients  have  been  duplicated  an  appropriate 
number  in  order  to  span  128  points.  For  example,  at  level  0,  the  only  coefficient  found 
at  that  level  is  duplicated  128  times,  at  level  1,  both  coefficients  found  at  that  level  are 
duplicated  64  times,  and  so  on.  This  provides  a  convenient  method  of  illustrating  the 
affine  spectrum.  Recall  that  the  128  points  are  associated  with  a  31.95  msec,  interval. 
The  result  of  compressing  the  coefficients  are  shown  as  shaded  areas  in  the  plots.  Figures 
54  through  55  show  the  result  of  compressing  the  same  frame  with  only  nearest  neighbor 
competition.  Clearly,  this  method  of  compressing  the  affine  spectrum  is  not  useful  for 
regenerating  good  quality  speech. 
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Difference  Original/Haar  Wavelet  Spectrum  From  Haar  Wavelet  Spectrum 


Figure  48.  Reconstructed  and  Difference  Signal  from  Haar  Wavelet  Coefficients 
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Difference  Original/Narrow  From  Narrowo^  Morlet  Wavelet  Spectrum 


Figure  5 1 .  Levels  0-2  of  Frame  9  of  Haar  Wavelet  Decomposition 
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Figure  52.  Levels  3-5  of  Frame  9  of  Haar  Wavelet  Decomposition 
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Figure  54.  Levels  3-5  of  Frame  9  Nearest  Neighbor  LIN  Compression  Result 
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Frame  9,  Level  7  Coefficients  Frame  9,  Level  6  Coefficients 


Noise  Filtering  Analysis 

Preliminary  tests  suggest  that  the  use  of  LINs  for  noise  filtering  neither  enhances 
nor  degrades  the  intelligibility  of  the  speech  signal.  With  high  signal-to-noise  ratios 
(SNRs)  the  speech  was  equally  intelligible  before  and  after  compression.  Similarly,  when 
the  SNR  was  lowered  significantly,  the  speech  was  not  understood  either  before  or  after 
compression.  Although  this  is  not  a  dramatic  result,  it  is  significant  when  comparing 
with  LPC  coded  speech.  It  is  well  known  that  LPC  coded  speech  degrades  very  rapidly 
(nonlinearly)  as  a  function  of  decreasing  SNR.  Thus,  if  the  speech  signal  is  understandable, 
no  extra  noise  filtering  is  required  before  the  speech  is  compressed  with  LINs  as  is  required 
for  LPC  compressed  speech. 

Summary  of  Results 

The  results  reported  in  this  chapter  were  for  the  spoken  word  seven,  and,  again, 
these  results  are  representative  of  the  results  found  for  the  rest  of  the  spoken  digits. 

In  the  case  where  speech  is  decomposed  using  the  short-time  Fourier  and  Gabor 
transforms,  the  LIN  is  capable  of  finding  all  relative  maxima  that  are  associated  with  the 
glottal  frequency  and  its  harmonics,  and  most  importantly  the  formant  peaks.  When  the 
Gabor  decomposition  was  used,  it  was  found  that  the  quality  of  the  reconstructed  speech 
depends  on  the  variable  Q,  which  defines  the  extent  of  the  biorthogonal  function.  Small 
values  of  Q  are  preferable  than  large  values  of  Q  in  this  case,  but  just  the  opposite  is  true 
if  all  the  Gabor  coefficients  are  used.  A  one-sided  exponential  window  produces  periodic 
clicks  in  the  reconstructed  signal,  even  when  all  coefficients  are  used.  The  reconstructed 
speech  from  the  compressed  short-time  Fourier  spectrum  is  of  much  better  quality  than 
the  Gabor  reconstructed  speech.  Of  the  three  windows  used  in  the  short-time  Fourier 
decomposition,  the  Gaussian  window  produced  the  best  quality  reconstructed  speech. 
Speech  reconstructed  from  the  LIN  compressed  Gaussian  spectrum  using  the  ERB  rule 
is  virtually  indistinguishable  from  the  LIN  compressed  spectrum  using  the  CB  rule. 
The  CB  rule  appiears  to  define  the  limit  of  the  competitive  bandwidths  since  increasing 
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the  bandwidths  beyond  this  limit  significantly  degrades  the  quality  of  the  reconstructed 
speech.  Overall,  compression  ratios  between  approximately  20  and  28  were  achieved 
resulting  in  perfectly  understandable  speech.  This  amount  of  compression  translates  to 
well  below  4.8  kbits/sec.  speech. 

The  LIN,  as  designed  in  this  thesis,  does  not  compress  the  affine  wavelet  spectrum 
suitably  for  resynthesizing  good  quality  speech.  Even  in  efforts  where  compression  was 
allowed  in  only  one  level,  the  resultant  regenerated  speech  was  sufficiently  degraded  in  a 
manner  that  was  consistently  rated  worse  than  the  regenerated  speech  from  the  compressed 
short-time  Fourier  spectrum.  In  these  cases  the  compression  of  the  Fourier  spectrum  was 
nearly  20  times  greater  the  the  compression  of  the  affine  spectrum. 

In  the  concluding  chapter,  the  implications  of  these  results,  especially  as  they  relate 
to  the  physiology  of  hearing,  are  explored.  In  addition,  suggestions  as  to  how  the  quality 
of  the  reconstructed  speech  can  be  improved  from  the  compressed  spectra  explored  in  this 
thesis  are  offered. 
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V.  Conclusions  and  Recommendations 


Introduction 

The  main  purpose  of  this  research  was  to  develop  non-linear  lateral  inhibition 
networks  (LINs)  for  coding  and  compressing  the  affine  and  W-H  decompositions  of 
speech.  The  choice  of  LINs  for  compression  is  biologically  motivated  since  these 
networks  are  ubiquitous  in  sensory  preprocessing  systems  across  species  for  the  main 
purpose  of  spatio/temporal  contrast  enhancement.  In  the  W-H  case,  the  LIN  is  designed 
to  search  for  spectral  peaks  and  eliminate  the  rest  of  the  frequency  components  in  each 
time  slice  of  the  two-dimensional  transform  lattice.  In  the  affine  wavelet  transform  space, 
the  LIN  is  designed  to  pick  temporal  coefficient  peaks  in  each  level  of  frequency  analysis. 
The  LIN  was  tuned  to  approximate  the  overlapping  bandwidths  of  the  theoretical  filter 
model  of  the  cochlea. 

In  the  following  section,  the  most  important  results  of  this  research  are  summarized. 
From  these  results,  conclusions  and  recommendations  for  possible  future  research  are 
suggested.  This  chapter  and  this  thesis  concludes  with  the  biological  implications  of  the 
results  found  in  this  research. 

Summary  of  Results,  Conclusions,  and  Recommendations 

The  best  overall  decomposition/compression/resynthesis  results  were  obtained  from 
the  short-time  Fourier  domain.  In  that  case,  up  to  approximately  28  times  compression 
of  the  short-time  Fourier  spectrum  was  achieved  resulting  in  clearly  intelligible  sp'^ech. 
This  amounted  to  approximately  95%  elimination  of  the  spectrum.  The  resynthesized 
speech  compressed  at  these  levels  translates  to  under  4.8  kbits/sec.  speech.  Although 
the  compressed  speech  is  perfectly  understandable,  this  speech  is  not  of  toll  quality 
due  to  the  musical  quality  of  the  fricatives.  The  compression  ratios  achieved — for  a 
specific  tuning  of  the  LIN-were  a  function  of  the  window  used  in  the  Fourier  expansion. 
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The  compactiy  supported  Gaussian  window  produced  the  best  compression,  followed 
by  the  Hamming  window,  and  the  rectangular  window  in  that  order.  If  one  examines 
the  spectral  characteristics  of  each  window  [30],  one  might  conclude  that  this  is  due  to 
the  side  lobe  characteristics  of  each  window.  All  other  things  being  equal,  the  more 
the  side  lobes  are  suppressed,  the  more  the  LIN  compresses.  This  hypothesis  can  be 
tested  in  future  research  by  using  other  windows  with  even  better  side  lobe  characteristics 
than  the  compactly  supported  Gaussian  window,  in  particular  the  Dolph-Chebyshev  and 
Kaiser-Bessel  windows. 

Compressing  the  Gabor  spectrum  produced  much  noisier  resynthesized  speech  than 
what  was  obtained  from  the  short-time  Fourier  domain.  It  is  concluded  here,  that  the 
poor  compression/resynthesis  obtained  from  the  Gabor  decomposed  speech  is  not  due  to 
the  LIN  compression.  In  each  case,  the  LIN  found  peaks  that  were  associated  with  the 
important  characteristics  of  speech:  the  glottal  frequency  and  its  harmonics  as  well  as 
the  formant  peaks.  Therefore,  the  problem  in  the  case  of  the  Gabor  expansion  is  not  in 
the  information  the  LIN  is  extracting  but  the  fact  that  components  are  being  eliminated. 
The  conjecture  is  that  errors  arise  due  to  problems  in  approximating  and  truncating 
the  biorthogonal  function  of  the  window  in  the  Gabor  expansion.  Further  elimination 
of  spectral  components  may  produce  Gibbs  like  errors  that  cannot  be  eliminated  with 
windowing.  Because  of  these  errors  and  the  extra  computational  complexity  of  the  Gabor 
expansion,  this  representation  is  not  recommended  as  an  appropriate  representation  for 
speech  compression. 

The  LIN  compressed  tiffine  wavelet  spectrum  does  not  produce  good  quality 
resynthesized  speech.  Regardless  of  whether  the  Haar  or  Morlet  wavelet  was  used  to 
decompose  speech,  the  signals  regenerated  from  the  compressed  representations  are  quite 
rough  sounding.  This  was  found  to  be  the  case  even  when  the  LIN  was  tuned  for 
minimal  competition,  that  is,  only  nearest  neighbor  competition.  It  appears  that  Mallat’s 
method  [45]  of  regenerating  the  coefficients  via  convex  space  projections  is  necessary 
to  reproduce  good  quality  reconstruction.  This  process  requires  that  at  least  the  local 
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maxima  are  chosen.  The  LIN  chooses  local  maxima  that  do  not  necessarily  coincide  with 
the  local  maxima  defined  by  Mallat.  Therefore,  if  the  LIN  is  to  be  used  for  this  purpose, 
the  network  must  be  redesigned. 

Even  if  a  LEN  can  be  designed  to  choose  the  appropriate  maxima  and  the  projection 
method  is  used  to  regenerate  the  speech  signals,  there  appears  to  be  at  least  two  fundamental 
problems  with  the  affine  wavelet  representation  for  speech  compression.  The  first  is  the 
limitation  of  compression  that  may  be  achieved  through  the  process  described  above.  As 
may  be  seen  from  Figures  51  through  55,  the  high  rate  of  variation  in  the  speech  signal 
produces  a  high  rate  of  variation  in  the  affine  spectrum.  As  a  result  of  this,  relatively 
few  components  can  be  classified  as  non-maxima  and  thereby  eliminated.  As  a  rough 
estimate,  only  25%  of  the  coefficients  of  the  speech  signals  processed  in  this  thesis  fall  in 
that  category.  This  is  far  less  than  the  number  of  coefficients  that  can  be  eliminated  in  the 
short-time  Fourier  space.  In  addition,  a  real-time  convex  space  projection  system  would 
require  a  highly  parallelized  and  expensive  system. 

Because  of  the  expense  of  the  convex  space  projection  method,  a  simple  cubic 
spline  or  Hermite  polynomial  first  cut  interpolation  of  the  coefficients  should  be  tested. 
This  may  be  all  that  is  required  to  dramatically  improve  the  reconstructed  speech.  This 
method  should  also  be  tried  in  the  compressed  short-time  Fourier  spectrum,  for  it  might 
eliminate  the  musical  quality  of  the  the  fricatives  of  the  reconstructed  speech. 

There  are  other  implementation  issues  which  must  also  be  addressed.  In  this 
thesis,  the  LIN  was  modeled  as  a  system  of  non-linear  differential  equations  whose 
solutiony  are  approximated  by  Euler’s  method.  This  is  not  the  most  efficient  method  of 
implementation,  however,  it  established  a  baseline.  As  may  be  recalled,  in  Chapter  III,  a 
linear  approximation  to  the  LIN  may  be  implemented  as  overlapped  Mexican-hat  function 
digital  filters.  In  fact,  this  can  be  reformulated  as  a  wavelet  process.  This  approach 
would  significantly  reduce  the  computational  load.  This  suggests  that  an  affine  wavelet 
decomposition  could  be  used  on  the  short-time  Fourier  spectrum  as  the  compression 
algorithm.  A  number  of  implementations  are,  therefore,  possible  when  viewed  in  this 
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manner.  In  spite  of  this,  it  must  be  realized  that  linear  approximations  of  non-linear  LIN 
dynamics  will  not  produce  the  same  computational  results.  Whether  these  approximations 
will  yield  results  that  sound  acceptable  to  listeners  should  be  established.  In  addition  to 
the  digital  implementations,  analog  VLSI  and  CMOS  implementations  of  LINs  are  also 
available  [52, 41].  These  circuits  should  satisfy  the  most  stringent  real-time  requirements. 

Biological  Implications 

In  the  previous  chapter  it  was  reported  that  nearly  95%  of  the  frequency  components 
of  the  short-time  Fourier  spectrum  can  be  eliminated  without  producing  a  significant  assault 
to  the  hearing  mechanisms,  whereas  eliminating  even  a  small  number  of  coefficients  in 
the  affine  wavelet  spectrum  produces  errors  that  the  hearing  mechanism  does  not  tolerate. 
What  are  the  differences  in  the  types  of  errors  that  occur  when  compressing  the  respective 
spectra,  and  how  do  these  affect  the  hearing  mechanism?  The  answer  to  the  first  question 
is  straightforward.  However,  one  must  hypothesize  when  answering  the  second  question. 

It  was  discussed  in  Chapter  III  that  time  aliasing  errors  occur  when  the  Fourier 
spectrum  is  nonuniformly  sampled  below  the  Nyquist  rate.  However,  these  errors  can  be 
suppressed  with  windowing.  On  the  other  hand,  compressing  the  affine  wavelet  spectrum 
produces  random  or  high  frequency  noise  [45,  3].  The  author  of  an  Aware,  Inc.  technical 
report  writes: 


Fourier-based  spectral  techniques  [in  compression]  tend  to  produce  errors 
of  the  aliasing  type  since  the  frequency  spectrum  itself  is  distorted,  while 
wavelet  methods  tend  to  produce  errors  of  the  noisy  type.  Noise  is  far  less 
offensive  to  the  human  visual  system  than  aliasing.  [3:40] 


Evidently,  just  the  opposite  is  true  for  the  auditory  system.  Why  this  should  be  the  case  is 
now  explored. 

Until  quite  recently,  the  role  of  the  efferent  or  descending  auditory  system  was  little 
more  than  a  mystery.  There  remains  a  general  lack  of  consensus  as  to  the  function  of 
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this  system;  however,  recent  hypotheses  seem  to  be  converging  towards  the  following 
oversimplified  process  [2,  8,  29,  39,  69].  As  may  be  recalled  from  Chapter  II,  acoustic 
signals  produce  frequency  dependent  vibratory  motion  at  specific  locations  along  the 
basilar  membrane.  This  vibratory  motion  is  transduced  into  electric  potentials  by  the  inner 
hair  cells  located  on  the  basilar  membrane.  The  electric  potentials  in  turn  activate  the 
cochlear  nerve  fibers  that  ultimately  signal  the  primary  auditory  fields  of  the  neocortex 
via  several  brainstem  and  thalamic  nuclei.  This  system  is  known  as  the  afferent  or 
ascending  system.  Thus,  the  afferent  system  acts  as  the  primary  auditor'  receptor  system 
by  transducing  and  encoding  acoustic  phenomena,  and  relaying  that  information  to  the 
brain. 

The  brain  seems  to  actively  control  this  process  via  the  efferent  system.  A 
descending  communication  system,  separate  from  but  parallel  to  the  ascending  system, 
terminates  in  the  brain  at  four  nuclei — two  on  each  side  of  the  brain  stem — called  the 
lateral  and  medial  superior  olivary  nuclei.  These  nuclei  reside  just  above  and  ventral 
to  the  cochlear  nucleus,  the  first  relay  station  of  the  afferent  system.  Neurons  from  the 
olivary  nuclei  ultimately  communicate  with  the  cochlea  through  the  olivocochlear  bundle. 
The  axons  of  this  transmission  line  bifurcate  and  either  synapse  with  the  dendrites  of  the 
afferent  system  (near  the  inner  hair  cells)  or  synapse  directly  with  the  outer  hair  cells. 
Functionally,  axodendritic  stimulation  appears  to  inhibit  inner  hair  cell  signaling.  Outer 
hair  cell  stimulation  produces  much  more  complicated  and  interesting  results.  Evidence 
suggests  that  outer  hair  cell  stimulation  produces  structural  changes  to  the  cell  resulting  in 
a  mechanical  action  on  the  tectorial  membrane.  This  electro-mechanical  action  acts  as  an 
automatic  gain  control  of  the  afferent  system.  Kim  elucidates  with  the  following  quotation 


The  function  of  the  OHCs  [outer  hair  cells]  is  to  enhance  actively  the 
sensitivity,  tuning,  and  dynamic  range  of  the  mechanical  response  of  the 
entire  organ  of  Corti . . . ,  conferring  high  sensitivity,  sharp  tuning,  and  a  wide 
dynamic  range  to  the  IHC  [inner  hair  cell]  subsystem.  [39] 
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and  Wiederhold  adds: 


The  efferents  could  also  serve  in  a  control  system,  possibly  driven  from 
cortical  origins  of  the  descending  auditory  pathways,  to  filter  out  acoustic 
signals  that  are  distracting,  or  to  make  more  intelligible  those  stimuli  of 
particular  interest.  [69] 

This  last  quotation  may  finally  shed  light  on  the  question  left  unanswered  in  the 
beginning  of  this  section:  why  are  the  errors  produced  by  compressing  the  Fourier 
spectrum  of  speech  tolerated  by  the  hearing  mechanism  and  not  the  errors  produced  by 
the  compressed  affine  wavelet  spectrum  of  speech?  To  aid  the  argument.  Figure  56 
shows,  from  top  to  bottom,  the  plots  of  the  short-time  Fourier  spectrum  of  frame  1 8  (see 
previous  chapter)  of  the  original  signal,  the  reconstructed  signal  from  the  compressed 
short-time  Fourier  spectrum  using  a  Gaussian  window,  and  the  reconstructed  signal  from 
the  compressed  Morlet  affine  wavelet  spectrum  respectively.  The  middle  plot  shows 
that  the  process  of  reconstruction  filled  in  some  of  the  components  eliminated  in  the 
compression  process.  Aliasing,  then,  amounts  to  localized  energy  reductions  of  the 
spectrum.  On  the  other  hand,  the  bottom  plot  verifies  that  broad  band  noise  is  produced 
when  the  affine  wavelet  spectrum  is  compressed.  Thus,  the  conjecture  is  that  the  brain 
actively  enhances  the  quality  of  the  compressed  Fourier  spectrum  reconstructed  speech 
by  boosting  the  gain  at  the  appropriate  locations  along  the  basilar  membrane.  The  brain 
is  incapable  of  enhancing  the  compressed  affine  spectrum  reconstructed  speech  to  the 
same  degree  because  the  error  is  spread  along  the  entire  length  of  the  basilar  membrane. 
Therefore,  the  brain  is  unable  to  correctly  enhance  or  attenuate  the  viL  dory  motion  of 
the  basilar  membrane  due  to  the  improper  cues.  Psycho/physiological  experiments  can  be 
designed  and  performed  that  test  these  hypotheses. 
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Appendix  A.  Code  for  Gabor  Transform 


/*  Written  by  Rick  Ricart,  Capt,  USAF  */ 
/*  Computes  the  Gabor  representation  of  speech  files  using  a  Gaussian  window  */ 
/*  The  sound  is  sampled  by  the  Codec  at  8012.8  Hz.  Each  samle  is,  therefore,  */ 
/*  taken  every  *24.8E-06  sec.  Delta  t  is  chosen  to  be  256*124 . 8E-06  or  */ 
/*  31.949E-03  sec.  Delta  f  is,  therefore,  31.3  Hz.  */ 
/*  This  version  uses  50%  overlapped  windows  */ 

/*  Call  program  as  follows:  gabor  <filename>  val,  where  val  is  0  or  1  */ 
/*  val“0  truncates  biorthogonal  function,  val-1  leaves  biorthogonal  function  */ 
/*  at  its  maximum  extent.  */ 


/********************^lr***********************«*,»****************«************/ 

♦import  <sound/sound.h> 

♦import  <math.h> 

♦import  <dsp/arrayproc.h> 

♦import  <mach.h> 

♦import  <stdlib.h> 

♦import  <stdio.h> 

♦import  <ob jc/ob jc .h> 

♦import  <soundkit/Sound.h> 

♦import  <soundkit/soundkit .h> 

♦import  <string.h> 

♦import  <macros.h> 

♦define  N_DELTA_T  256 
♦define  POINTS  2*N_DELTA_T 
♦define  K_TOTAL_DFS  128 
♦define  M_TOTAL_DTS  31 

♦define  TOT  M_TOTAL_DTS*N_DELTA_T  /*  approximately  one  sec  of  sound  */ 

♦define  TAU  1  /*  exponential  time  constant  */ 

♦define  PI  3.141592654 
♦define  ABS (x)  sqrt (pow (x, 2 . 0) ) 

♦  define  ABS2 (x)  ( (float) sqrt (( (double) x) *( (double) x) )  ) 


/*  Global  Variables  */ 
id  mySound,  newSound; 

SNDSoundStruct  * soundStruct ,  ‘convertStruct; 

/*  the  filenames  •/ 

char  magnitudefile [80] ,  phasefile [80] ,  signalfile [80] ; 

void  get_data (short  **temp,char  *infile) 

( 

int  error,  data_size,i; 

BOOL  edit; 

/* 

inPtr  =  (short  *)  ((char  •)  inputSound  +  inputSound->dataLocation) ; 
outPtr  -  (short  *)  ((char  *)  *outputSound  +  (‘outputSound) ->dataLocation) ; 

*/ 

error  ■  [mySound  readSoundfile : infile] ; 

/*  initialize  mySound  to  infile's  mySound  object  */ 
soundStruct  “[mySound  soundStruct); 

[mySound  isEditable]; 

data_3ize  “  soundStruct->dataSize; 

SNDAlloc (SconvertStruct, data_size, SND_FORMAT_LINEAR_16, SND_RATE_CODEC, 
soundStruct->channelCount, "") ; 


SNDConvertSound(soundStruct, tconvertStruct)  ; 

*temp”  (short  *)  (convertStruct+convertStruct->dataLocation)  ; 

) 

void  write_output_files  (short  ‘signal, float  ‘gaborMag, 
float  ‘gaborPhase,  float  ‘gamma, 
char  ‘infile,  char  ‘index) 

( 

FILE  ‘fl,‘f2,‘f3,‘f4; 
int  )c,m,n; 
size_t  s_le.n; 

/‘  get  length  of  input  file  not  counting  null  terminator  ‘/ 
s_len*strlen (infile) ; 

/‘  start  name  of  signalfile  with  infile  minus  suffix  .snd  ‘/ 
strncpy  (signalfile,  infile,  (s_len-4)  )  ; 

/‘  must  add  terminating  null,  strncpy  does  not  automatically  do  it  ‘/ 
signalfile [s_len-4] »' \0' ; 

/‘  start  name  of  magnitudefile  with  infile  minus  suffex  .snd  ‘/ 
strncpy (magnitudefile, inf ile, (s_len-4) ) ; 
magnitudefile [s_len-4 ]=' \0' ; 

/*  start  name  of  phasefile  with  infile  minus  suffex  .snd  ‘/ 
strncpy (phasefile, infile, (s_len-4) ) ; 
phasefile (s_len-41=' \0'  ; 

/*  add  the  appropriate  gamma  truncation  index  ‘/ 
scrcat (phasefile, index) ; 
strcat (magnitudefile, index) ; 

/‘  signalf ile=f ilename . dat  ‘/ 
strcat (signalfile, " .dat") ; 

/‘  magnitudef ile«f ilename_gaborMag. dat  ‘/ 
strcat (magnitudefile, "_gaborMag.dat") ; 

/‘  phasefile='filename_gaborPhase.dat  ‘/ 
strcat (phasefile, "_gaborPhase.dat") ; 

/‘  open  and  write  signal  data  to  signalfile  ‘/ 
if  ( (f 1  •  fopen (signalf ile, "w" ) )  «=  NULL) ( 

printf("\n‘“  Cannot  can't  create  %s  signalf  ile)  ; 

exit  (0) ; 

) 

loopn (TOT) 

fprintf (f 1 , "%d\n", signal (n) )  ; 

/‘  open  and  write  gaborMagnitude  data  to  magnitudefile  ‘/ 
if  ( (f2  =  fopen (magnitudefile, "w") )  ==  NULL) ( 

printf("\n‘“  Cannot  can't  create  %s  magnitudefile)  ; 

exit  (0)  ; 

) 

loopm (TOT) 

fprintf (f 2, " %f \n" , gaborMag (m) ) ; 

/‘  open  and  write  gaborPhase  data  to  phasefile  ‘/ 
if  ( (f3  ”  fopen (phasefile, "w") )  ««  NULL) { 

printf("\n‘“  Cannot  can't  create  %s  phasefile)  ; 

exit  (0) ; 

) 

loopm (TOT) 

fprintf (f3, "%f\n”, gaborPhase [m] )  ; 

if  ((f4  “  fopen ("garama0.dat", "w") )  "  NULL) ( 

printf("\n“‘  Cannot  can't  create  %s  "gamma0.dat")  ; 

exit  (0) ; 

) 

loopm (2‘TOT) 

fprintf(f4,"%f\n", gamma [m] ) ; ‘/ 

/‘  close  all  files  ‘/ 
fclose (f 1 ) ; 
fclose (f2) ; 
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f close (f 3) ; 

/*  fclose (f4) ; •/ 
) 


/*•*«•***•••**.  get_gaiiima  ( )  ***••*»***•**•/ 

/*  This  function  creates  the  biorthogonal  function 
of  the  Gaussian  window  •/ 

/•**<>*<t***********ii***»*********<i***(*<i««*^ 

void  get_ganima  (float  *ganiina,  int  index) 

{ 

int  i,n; 

double  )c, coeff.time; 
char  in_string(8] ; 


coef f”sqrt (sqrt (0.5))  ; 
k=l . 85407468; 

/*  create  biorthogonal  window  function  */ 
loopn (2*TOT) ( 

time= ( (double) (n-TOT) / (double) (N_DELTA_T) ) ; 

if ( ( ! index) *4 ( (n<TOT-2*N_DELTA_T)  I  I  (n>=TOT+2*N_DELTA_T) ) ) ( 

if  (n*=0) 

printf ("truncated  gamma  function,  index=0\n"); 
gamma [n] “O; 

) 

else{ 
if  (n==0) 

printf ("max  gamma  function,  index=l\n") ; 
loopi (3) 

gamma [n] +- (float)  (coeff* 
exp (PI*time*time-PI* 

pow (floor (ABS (time) +0 . 5) +0 . 5+ (double) i, 2 . 0) ) • 
pow(  (1</PI)  ,  (-3. 0/2.0)  )  * 

pow (-1.0, floor (ABS (time) +0 . 5+ (double) i) ) ) ; 

) 

) 

) 

/**•**•**••**  window_data  ***♦♦****••***••**»•*»/ 

/*  This  function  produces: 

S0M(over  q-0.  .  .M_DELTA_DTS)  f  (n+q*N_DEI,TA_T)  ‘gamma  (n+ (q-m)  *N_DELTA_T)  , 
for  a  given  time  slice  m,  and  where  f  is  the  signal  function  (timeReal)  and 
gamma  is  the  biorthogonal  function  of  the  window  */ 
/*«*********«***************«**********************^ 
void  window_data (float  ‘gamma,  float  ‘signal,  float  ‘timeReal) 

( 

I  define  A_ADR  DSPAPGetLowestAddress ()  /‘  address  of  signal  data  part  */ 

I  define  B_ADR  (A_ADR  +  N_DELTA_T)  /‘  address  of  biorthogonal  data  part  ‘/ 

#  define  C_ADR  (B_ADR  +  N_DELTA_T)  /‘  address  of  accumulated  sum  */ 

*  define  D_ADP  (C~ADR  +  N_DELTA_T)  /*  address  of  result  ‘/ 

I  define  INC  1  /‘  increment  for  all  array  ‘/ 

int  p; 


/*  DSPFix24  arrays  ‘/ 

DSPFix24  A_ARRAlf  [N_DELTA_T]  ,B_ARBAY[N_DELTA_T1  ,RES0LT[N_DELTA_T1  ; 

/‘  set  vectors  C  and  D  to  lero  ‘/ 

DSPAPvclear (C_ADR, INC, N_DELTA_T) ; 

DSPAPvclear  (D_ADR,  INC,  N_DE1,TA_T)  ; 

loopp (M_TOTAL_DTS) ( 

/*  Convert  data  from  float  to  DSPFix24  ‘/ 

DSPFloatToFix24Array (4signal (p*N_DELTA_T] ,  A_ARRAY,  N_DELTA  T) ; 
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DSPFlo«tToFix24Array  (*ganima[p*N_DELTA_Tl .  B_ARRAy,  N_DELTA_T)  ; 
/*  load  arrays  to  DSP  memory  */ 

DSPAPWriteFix24Array  (A_ARIlAy,  A_ADR,  INC,  N_DELTA_T)  ; 
DSPAPWriteFix24Array (B_ARRAY,  B_ADR,  INC,  N_DELTA_T) ; 

/*  swap  C  and  D  arrays  (answer  to  summing  array)  •/ 

DSPAPvswap (C_ADR, INC, D_ADR, INC,  N_DELTA_T)  ; 

/*  This  function  computes  A*B+C  and  puts  answer  in  D  */ 
DSPAPvtvpv  (A_ADR,  INC,P_ADR,  INC,C_ADR,  INC,D_ADR,  INC, N_DELTA_T)  ; 
) 

/*  Return  result  from  DSP  memory  to  host  memory  */ 
DSPAPReadFix24Array  (RESULT,  D_ADR,  INC,  N_DELTA_T) ; 

/*  Convert  data  from  DSPFix24  to  float  */ 

DSPFix24ToFloatArray (RESULT,  timeReal,  N_DELTA_T) ; 

) 


/•  squash  data  */ 

float  squash (float  val,  float  oldmax,  float  oldmin,  float  newmax,  float  newmin) 
( 

float  answer; 

answer* ( ( (val-oldmin) / (oldmax-oldmin) ) * (newmax-newmin) ) +newmin; 
return  answer; 

) 


void  fft_wind_data (float  ‘timeReal,  float  ‘timelmag,  float  ‘gaborReal, 
float  ‘gaborlmag) 

{ 

#  define  DATA_ADR  DSPAPGetLowestAddressXY ( ) 

#  define  COEF_ADR  (DATA_ADR  +  N_DELTA_T) 

I  define  IMAG^DATA  DSPMapPMemY (DATA_ADR) 

#  define  REAL_DATA  DSPMapPMemX (DATA_ADR) 

#  define  SIN_TABLE  DSPMapPMemY (COEF_ADR) 

«  define  COS_TABLE  DSPMapPMemX (COEF_ADR) 

float  ‘sinTab  =  DSPAPSinTable (N_DELTA_T) ; 
float  ‘cosTab  -  DSPAPCosTable (N_DELTA_T) ; 
int  i,m; 

/*  DSPFix24  arrays  */ 

DSPFix24  TimeReal (N_DELTA_T] , Timelmag (N_DELTA_T) , GaborReal [N_DELTA_T) , 
Gaborlmag [N_DELTA_T) ; 


/*  Squash  float  data  from  -1  to  1  */ 

/*  loopi (N_DELTA_T) ( 

timeReal ( i ) -squash (timeReal (i],lE6,-lE6,1.0,-1.0); 
timelmag [ i ) -squash (timelmag (i) ,1E6,-1E6,1.0,-1.0); 
)*/ 


/*  Convert  data  from  float  to  DSPFix24  */ 
DSPFloatToFix24Array (timeReal,  TimeReal,  N_DELTA_T) ; 
DSPFloatToFix24Array (timelmag,  Timelmag,  N_DELTA_T) ; 


/♦  put  the  time  domain  complex  array  */ 
DSPAPWriteFix2  4Array  (TimeReal,  REAL_DATA,  1, 
DSPAPWriteFix24Array (Timelmag,  IMAG_DATA,  1, 


N_DELTA_T)  ; 
N  DELTA  T) ; 
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/*  put  the  cos  and  sine  tables  */ 

DSPAPWriteFloatArray (cosTab,  COS_TABLE,  1,  N_DELTA_T/2) ; 
DSPAPWriteFloatArray  (sinTab,  SIN_TABLE,  1,  N_DELTA_T / 2 ) ; 

DSPAPfftr2a  (N_DELTA  T,  DATA_ADR,  COEF  ADR)  ; 


/* 

*  Get  the  gabor  domain  complex  array. 

*  Tell  monitor  to  read  the  array  bac)c  with  bit-reversed 

*  addressing  because  fftr2a  leaves  its  output  shuffled. 

*  Note  ^hat  the  skip  factor  (N_DEI,TA_T/2)  is  used  to  set  the 

*  DSP  'N'  register. 

*/ 

DSPSetDMAReadMReg (0) ; 

DSPAPReadFix24Array  (GaborReal,  REAL_DATA,  N_DELTA_T/2,  N_DELTA_T) ; 
DSPAPReadFix24Array (Gaborlmag,  IMAG_DATA,  N_DELTA_T/2,  N_DELTA_T) ; 
DSPSetDMAReadMReg (-1) ;  /*  re-select  linear  addressing  •/ 

/*  Convert  data  from  DSPFix24  to  float  */ 

DSPFix24ToFloatArray (GaborReal,  gaborReal,  N_DELTA_T); 
DSPFix24ToFloatArray (Gaborlmag,  gaborlmag,  N_DELTA_T) ; 

/*  Squash  float  data  from  -1  to  1  */ 
loopi (N_DELTA_T) { 

gaborReal (i ] -squash (gaborReal (il,1.0,-1.0,lE6, -1E6) ; 
gaborlmag [ i ) -squash (gaborlmag ( i ] , 1 . 0, -1 . 0, 1E6, -1E6) ; 
t 


) 


void  get_gabor_mag (float  ‘gaborReal,  float  ‘gaborlmag,  float  ‘gaborMag) 

( 

int  k,m; 

loopm (M_TOTAL_DTS‘2) 
loopk (K_TOTAL_DFS) 
gaborMag :m‘K_TOTAL_DFS+k ) - 

(float) sqrt ( (double) gaborReal [m‘N_DELTA_T+kl ‘ 

(double) gaborReal (m‘N_DELTA_T+kl + 

(double) gaborlmag [m‘N_DELTA_T+k ] ‘ 

(double) gaborlmag [m‘N_DELTA_T+kl ) ; 

) 

void  get_gabor_pha3e (float  ‘gaborReal,  float  ‘gaborlmag,  float  ‘gaborPhase) 
( 

int  k,m; 

loopm (M_T0TAL_DTS‘2) 
loopk (K_TOTAL_DFS) 

if  (  (gaborReal  (m‘N_DELTA_T+k]  —  0)  ti  (gaborImag(m‘N_DELTA_T+kl— 0)  ) 

gaborPhase [m‘K_TOTAL_DFS+k]- ( (float)PI/2) ; 

else 

gaborPhase (m‘K_TOTAL_DFS+k 1 - 

( ( float ) at an2 ( (double) gaborlmag (m‘N_DELTA_T+k) , 

(double) gaborReal [m‘N_DELTA_T+k) ) ) ; 

) 

main  (int  irgc,  char  “argv) 

( 

short  ‘temp;  /•  integer  representat ion  of  original  data  */ 

float  signal  [TOT+N_DELTA_T/'2I ;  /*  float  representation  of  original  data  */ 
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float  timeReal [2*T0T] ;  /*  the  overlapped  windowed  aignal  */ 
float  timeImag(N_DELTA_T] ;  /*  the  imaginary  portion  of  the  data  */ 
float  gaborReal (2*TOT) , gabor Imag [2*T0T] , gaborMag(TOT) , gaborPhase [TOT] ; 
float  gamma [2*T0T] ;  /*  the  biorthogonal  function  of  the  window  •/ 
float  temp_gamma [2»T0T] ;  /*  squashed  version  of  gamma  */ 
int  i; 

/*  initialize  sound  objects  •/ 
mySound“ [Sound  new) ; 
newSound” [Sound  new) ; 

/*  get  sound  data;  timeReal  has  float  rep.  of  sound  */ 
get_data (ttemp, argv [1 ] ) ; 

/•  get  float  values  of  signal  */ 
loopi (TOT) 

signal [i 1 “ (float) temp [i ] ; 

/*  clear  rest  of  signal  •/ 

for { i»TOT ; i<T0T+N_DELTA_T/2 ; ++i ) 

signal [il«0; 

/»  zero  out  timelmagO  */ 
loopi (N_DELTA_T) 
t imelmag [i 1 -0; 

/*  clear  gamma [ ]  */ 
loopi (2*TOT) 
gamma [i ]=0; 

/*  generate  biorthogonal  window  •/ 
get_garoma (gamma, atoi (argv[2) ) ) ; 

/•  Copy  elements  of  data  and  gamma  function  to  temporary  arrays  and 
squash  float  data  from  -1  to  1  */ 
loopi (TOT) 

signal [ i ] -squash (signal [i],lE6,-lE6,1.0,-1.0); 
loopi (2*TOT) 

temp_gamma [i ] -squash (gamma [i], 200, -200, 1.0, -1.0); 

/*  Initialize  the  DSP  chip  */ 

DSPAPInit  0 ; 

/*  This  computes  Eq.  (14)  in  the  Einziger  reference  using  the  DSP  chip 
but  with  50%  overlap  windows  */ 
loopi (M_TOTAL_DTS*2) ( 
if  (i%2==0) 

window_data (&temp_gamma (TOT- i •N_DELTA_T/2 )  ,  signal, 

StimeReal (i*N_DELTA_T] ) ; 
else 

window_data (4temp_gamma [TOT- (i-1) *N_DELTA_T/2J , tsignal [N_DELTA_T/2 ] , 
4timeReal(i*N_DELTA_T)) ; 

printf ("windowing  pass  no.  %d--%d\n", i, i%2) ; 

) 

/*  This  computes  the  EFT  of  each  windowed  time  slice  •/ 
loopi (M_TOTAL_DTS*2) { 
f ft_wind_data (itimeReal [i*N_DELTA_Tl , 
timelmag, 

igaborReal [i*N_DELTA_Tl , 
igabor Imag [ i •N_DELTA_T1 ) ; 

) 

/•  Free  the  DSP  chip  */ 

DSPAPFree  0 ; 

get_gabor_mag (gaborReal , gaborlmag, gaborMag)  ; 
get_gabor_phase (gaborReal, gaborlmag, gaborPhase)  ; 

write_output_f iles (temp, gaborMag, gaborPhase, gamma, argv [ 1 ] , argv [2] ) ; 

) 
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Appendix  B.  Code  for  Short-Time  Fourier  Transform 


/»**•******•*************  spectrogram .m  »***********************/ 

/*  Written  by  Rick  Ricart,  Capt,  USAF  */ 


/*  Computes  the  spectrogram  of  arbitrary  signals  */ 
/*  Call  program  as:  spectrogram  filename. snd  vail  val2  */ 
/*  Where  vall-0  for  a  Rectangular  window,  vall“l  for  a  */ 
/*  Gaussian  window  and  vall”2  for  a  Hamming  widow  */ 
/*  val2  is  the  standard  deviation  of  the  Gaussian  noise  */ 


/*  If  val2«0  no  noise  is  added  */ 


limport  <sound/sound.h> 

I import  <math.h> 
limport  <dsp/arrayproc.h> 
limport  <mach.h> 
limport  <stdlib.h> 
limport  <stdio.h> 
limport<objc/objc .h> 

I import <soundk it /Sound. h> 
limport<soundkit /soundkit . h> 
I import <string.h> 
limport<macros .h> 


Idefine  N_DELTA_T  256 
♦define  K_TOTAL_DFS  128 

♦define  M_TOTAL_DTS  66  /*  Change  this  accordingly  •/ 

♦  define  TOT  N_DELTA_T*M_TOTAI,_DTS 
♦define  PI  3.141592651 

♦  define  ABS(x)  ( (float) sqrt (( (double) x) *( (double) x) ) ) 

/*  Global  Variables  */ 
id  mySound,  newSound; 

SNDSoundStruct  * soundSt ruct ,  ‘convertStruct; 

/*  the  filenames  */ 

char  spectrofile [80] ,  phasef ile (80) ,  signalf ile  [80] ,  newsignalf ile ( 80) , 
newsoundf ile [80] ; 

/*  Procedures  and  Functions  */ 
float  gasdev  ()  ; 

void  get_data  (short  *»temp,char  'infile) 

( 

int  error,  data_size,i; 

BOOL  edit; 

/• 

inPtr  ”  (short  *)  ((char  *)  inputSound  +  inputSound->dataLocation) ; 
outPtr  “  (short  *)  ((char  •)  *outputSound  +  (*outputSound) ->dataLocation) ; 

•/ 

error  «  [mySound  readSoundfile : infile] ; 

/*  initialize  mySound  to  infile's  mySound  object  */ 
soundStruct  -[mySound  soundStruct ] ; 

[mySound  isEditable]; 

data_size  -  soundStruct->dataSize; 

printf ("data  size  is  %d\n" , data_size) ; 

SNDAl 1 oc ( t  convert  St  ruct , data_s i ze , SND_FORMAT_LINEAR_l 6, SND_RATE_CODEC , 
soundStruct->channelCount, "") ; 

SNDConvertSound (soundStruct , tconvertStruct) ; 

•temp-  (short  *)  (convertStruct  +  convertStruct->dataLocation)  ; 
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printf ("Finished  get_data () \n") ; 
) 


void  write_output_files (short  "signal, float  "newsignal, 

float  "Phase,  float  "spectrogram, 

float  "window,  char  "infile,  char  "wtype) 

{ 

FILE  "fl, "f2, "f3, "f4, "f5; 
int  k,m,n; 
sire_t  s_len; 

/"  get  length  of  input  file  not  counting  null  terminator  "/ 
s_len-strlen  (inf ile) ; 

/"  start  name  of  newsignalf ile  with  "new_(wtype) *  •/ 
strcpy (newsignalf ile, *new_") ; 
strcat (newsignalfile, wtype)  ; 

/"  start  name  of  signalfile  with  infile  minus  suffix  .snd  •/ 
strncpy (signalfile, infile, (s_len-4) ) ; 

/"  must  add  terminating  null,  strncpy  does  not  automatically  do  it  "/ 
signalfile [s_len-4] =' \0' ; 

/*  start  name  of  phasefile  with  infile  minus  suffix  .snd  */ 
strncpy (phasefile, infile, (s_len-4 ) ) ; 
phasefile [s_len-4 ] =' \0' ; 

/*  start  name  of  spectrofile  with  infile  minus  suffix  .snd  "/ 
strncpy (spectrofile, infile, (s_len-4) ) ; 
spectrofile (s_len-4 ] »' \0' ; 

/"  signalfile«filename . dat  "/ 
strcat (signalfile, " .dat") ; 

/"  newsignalf ile“new_f ilename . dat  "/ 
strcat (newsignalfile, signalfile) ; 

/"  add  in  window  index  "/ 
strcat (phasefile, wtype) ; 
strcat (spectrofile, wtype) ; 

/*  phasefile-filename_Phase.dat  "/ 
strcat (phasefile, "_Phase.dat") ; 

/*  spectrofile«f ilename_spectrogram.dat  "/ 
strcat (spectrofile, "_3pectr0gram.dat") ; 

/•  open  and  write  signal  data  to  signalfile  "/ 
if  ((fl  •  fopen (signalfile, "w") )  ==  NULL) ( 

printf ( "\n"" "  Cannot  can't  create  %s  """", signalfile) ; 
exit (0) ; 

) 

loopn (TOT) 

fprintf(fl,"ld\n", signal [n] ) ; 

/"  open  and  write  newsignal  data  to  newsignalfile  "/ 
if  ( (f2  -  fopen (newsignalf ile, "w") )  «-  NULL) { 

printf (*\n"""  Cannot  can't  create  %s  **•", newsignalfile) ; 
exit (0) ; 

) 

loopn (TOT) 

fprintf(f2,"%f\n", newsignal (n) ) ; 

/"  open  and  write  Phase  data  to  phasefile  "/ 
if  ((f3  »  fopen (phasefile, "w") )  —  NULL) ( 

printf ("\n"""  Cannot  can't  create  %s  """", phasefile) ; 
exit  (0) ; 

) 

loopra (TOT) 

fprintf (f3, "%f\n”, Phase (m) ) ; 

/"  open  and  write  spectrogram  data  to  spectrofile  "/ 
if  ((f4  -  fopen (spectrofile, "w" ) )  »«  NULL) { 

printf  ("\n"""  Cannot  can't  create  4s  """",  .spectrofile)  ; 
exit (0)  ; 

) 
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loopm (TOT) 

fprintf  (f4,  "%f  \n",  spectrogram[nil )  ; 

/*  if  ( (fS  -  fopen ("gauss_window.dat", "w") )  «-  NULL) { 

printf  Cannot  can't  create  %s  ••gauss_window.dat") ; 

exit (0) ; 

) 

loopm (N_DELTA_T) 

fprintf  (f5,  '■%f\n’',window[m] )  ;  */ 

/*  close  all  files  •/ 
fclose (fl) ; 
fclose (f2) ; 
fclose (f 3) ; 
fclose (f 4) ; 

/‘fclose(f5)  ;*/ 

) 

void  get_window (float  *hamWindow,  int  window) 

( 

int  n; 

double  time; 

/*  create  window  */ 
if  (window==0) 
loopn (N_DELTA_T) 

hamWindow (n 1 “1 . 0;  /*  Rectangular  */ 
else 

if (window«»l) 
loopn (N_DELTA_T) ( 

time= (double) (n-N_DELTA_T/2) / (double) (N_DELTA_T); 
hamWindow (n) = (float ) (exp (-PI*4 . 0*time*time) ) ;  /*  Gaussian  */ 

) 

else 

if  (window==2) 
loopn (N_DELTA_T) 

)iamWindow  (n)  =  (float)  (0.54-0.46*  /*  )iamming  */ 

cos ( (double)PI* (double)  (2*n) / (double)  (N_DELTA_T-1) ) ) ; 

else 

if  (window"3) 
loopn (N_DELTA_T) 

hamWindow [n] *  /*  one  period  sin  */ 

(float)  (sin ( (double) PI* (double) n/ (double )N_DELTA_T) )  ; 

) 

void  ham_data  (float  *signal,  float  *hamWindow,  float  *timeReal) 

( 

int  n; 

/*  multiply  every  N_DELTA_T  points  of  data  by  window  •/ 
loopn (N_DELTA_T) 

timeReal [n)«3ignal [n] •hamWindow(n]  ; 

) 


/*  squash  data  */ 

float  squash (float  val,  float  oldmax,  float  oldmin,  float  newmax,  float  newmin) 
( 

float  answer; 

answer” ( ( (val-oldmin) / (oldmax-oldmin) ) * (newmax-newmin) ) tnewmin; 
return  answer; 

) 


void  fft_wind_data (float  *timeReal,  float  *timelraag,  float  *gaborReal, 
float  *gaborImag) 
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< 

I  define  DATA_ADR  DSPAPGetLowestAddressXY () 

<  define  COEF_ADR  (DATA_ADR  +  N_DELTA_T) 

#  define  IMAG_DATA  DSPMapPMemY (DATA_ADR) 
f  define  REAL_DATA  DSPMapPMemX  (DATA_ADR) 

#  define  SIN_TABLE  DSPMapPMemY (COEF_ADR) 
f  define  COS~TABLE  DSPMapPMemX (COEF_ADR) 

float  *sinTab  -  DSPAPSinTable {N_DELTA_T) ; 
float  ‘cosTab  -  DSPAPCosTable (N_DELTA_T) ; 
int  i,m; 

/*  DSPFix24  arrays  */ 

DSPFix2  4  TimeReal (N_DELTA_T) ,TimeImag[N_DELTA_Tl .GaborReal [N_DELTA_T] , 
GaborImag[N_DELTA_T] ;  ~ 


/*  Squash  float  data  from  -1  to  1  */ 
loopi (N_DELTA_T) { 

timeReal  [i]  -squash  (timeReal  [il,lE6,-lE6,1.0,-1.0); 
timelmag ( i 1 -squash (timelmag [il,lE6,-lE6,1.0,-1.0); 

) 

/*  Convert  data  from  float  to  DSPFix24  •/ 

DSPFloatToFix24Array (timeReal,  TimeReal,  N_DELTA_T); 
DSPFloatToFix24Array (timelmag,  Timelmag,  N_DELTA_T) ; 

/*  put  the  time  domain  complex  array  */ 

DSPAPWriteFix24Array  (TimeReal,  REAI,_DATA,  1,  N_DELTA_T)  ; 
DSPAPWriteFix24Array  (Timelmag,  IMAG_DATA,  1,  N_DEI,TA_T)  ; 

/*  put  the  cos  and  sine  tables  */ 

DSPAPWriteFloatArray (cosTab,  COS_TABLE,  1,  N_DELTA_T/2) ; 
DSPAPWriteFloatArray (sinTab,  S1N_TABLE,  1,  N_DELTA_T/2) ; 

DSPAPfftr2a (N_DELTA_T,  DATA_ADR,  COEF_ADR) ; 

/* 

*  Get  the  gabor  domain  complex  array. 

*  Tell  monitor  to  read  the  array  back  with  bit-reversed 

*  addressing  because  fftr2a  leaves  its  output  shuffled. 

*  Note  that  the  skip  factor  (N_DELTA_T/2)  is  used  to  set  the 

*  DSP  'N'  register. 

*/ 

DSPSetDMAReadMReg (0)  ; 

DSPAPReadFix24Array (GaborReal,  REAL_DATA,  N_DELTA_T/2,  N_DELTA_T) ; 
DSPAPReadFix24Array (Gaborlmag,  IMAG_DATA,  N_DELTA_T/2,  N_DELTA_T) ; 
DSPSetDMAReadMReg (-1) ;  /*  re-select  linear  addressing  */ 

/*  Convert  data  from  DSPFix24  to  float  */ 

DSPFix24ToFloatArray  (GaborReal,  gaborReal,  N_DEI,TA_T)  ; 
DSPFix24ToFloatArray (Gaborlmag,  gaborlmag,  N_DELTA_T) ; 

/*  Squash  float  data  from  -1  to  1  */ 
loopi (N_DELTA_T) { 

gaborReal [i ] -squash (gaborReal [il,1.0,-1.0,6E3,-6E3); 
gaborlmag ( i ) -squash  (gaborlmag (i 1 , 1 . 0, -1 . 0, 6E3, -6E3) ; 

) 


void  get_phase (float  ‘spectroReal ,  float  *spectroImag,  float  *Phase) 

( 

int  k,m; 
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loopm (M_T0TAL_DTS*2) 
loopk {K_TOTAL_DFS) 

if  (  (spectroReal  [m*N_DELTA_T+k]  =*0)  tl  <spectroImag[m*N_DEI.TA_T+k]  “=0)  ) 

Phase [m*K_TOTAL_DFS+k]  =  ( (float)PI/2)  ; 
else 

Phase [m*K_TOTAL_DFS+k ] - 

(  (float )  atan2  (  (double)  spectrolmag  [m*N_DELTA_T+lc) , 

(double)  spectroReal  tm*N_DELTA_T+)c) )  )  ; 

> 

void  get_spectrogram (float  "spectroReal, float  "spectrolmag,  float  "spectrogram) 

{ 

int  )c,m; 

loopm (M_T0TAL_DTS"2) 
loop)t  (K_TOTAL_DFS) 
spectrogram  [m*K_TOTAL_DFS+)cl  = 

(float)  sqrt  (  (double)  spectroReal  (m*N_DELTA_T+)c)  " 

(double)  spectroReal  [m*N_DELTA_T+)il  + 

(double)  spectrolmag  [m*N_DELTA_T+)c  1  * 

(double)  spectrolmag  [m*N_DELTA_T+)t  1  )  ; 

) 

void  generate  tables (float  "cosine, float  "sine) 

{ 

int  )t,m,n; 
double  temp; 

/*  create  sine  and  cosine  tables  */ 
loop);  (K_TOTAL_DFS) 
loopn (N_DELTA_T) { 

cosine  ()c"N_DELTA_T+nl«  (float)  (cos  (PI •  (double)  (2*);*n) 

/ (double) N_DELTA_T) ) ; 

sine();*N_DELTA_T+nl-(float)  (sin  (P I  *  (double)  (2")c"n) 

/ (double) N_DELTA_T) ) ; 

) 

) 

void  regenerate_sound (float  "Real,  float  "Imag, 
float  "new_soundl,  float  "new_sound2, 
float  "cosine,  float  "sine) 

{ 

int  i,)t,m,n; 
loopm (M_TOTAL_DTS) { 

/*  printf ("regenerating  time  slice  %d  ...\n",m);  "/ 
loop);  (K_TOTAL_DFS) 
loopi {N_DELTA_T) 
new_soundl (m"N_DELTA_T+i ] +« 

(Real  [2"m*N_DELTA_T+)c)  " 
cosine  f)c"N_DELTA_T+i)  - 
Imag[2"m"N_DELTA_T+)c)  " 
sine()t"N_DELTA_T+i]  )  ; 

) 

loopm (M_TOTAL_DTS) { 

/"  printf ("regenerating  time  slice  %d  ...\n",ra);  "/ 
loop);  (K_TOTAL_DFS) 
loopi (N_DELTA_T) 
new_sound2 (m"N_DELTA_T+i ] +« 

(Real  [  (  (2"m)  +1 )  "N__DELTA_T+);1  • 
cosine  ();*N_DELTA_T+i)  - 
Imag(  (  (2"m)  +1)  "N_DELTA_T+);1  " 
sine(l;"N_DEI,TA_T+i)  )  ; 
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) 

) 


void  ham_Newdata (float  *new_soundl , float  *new_sound2, 
float  *hamWindow, char  *infile,  int  window) 

{ 

int  )c,m,  n; 

/♦  multiply  every  N_DELTA_T  points  of  data  by  window  */ 

/•  loopm (M_TOTAL_DTS) 
loopn (N_DELTA_T) 

new_soundl [m*N_DELTA_T+n] *“hamWindow [nj  ;  */ 

/*  mulitply  every  N_DELTA_T  points  by  window  */ 

/*  loopm (M_TOTAL_DTS) 
loopn (N_DELTA_T) 

new_sound2  [m*N_DELTA_T+n]  *=hzunWindow [n]  ;  */ 

/*  if  window  not  rect,  add  the  overlapped  reconstructed  data  points  */ 

if (window)  (  /*  if  window  is  not  rectangular...  */ 

printf ( "Adding  overlapped  sample;  not  rectangularVn") ; 

loopm (M_TOTAL_DTS) 

loopn (N_DELTA_T) 

new_soundl [m*N_DELTA_T+n+N_DELTA_T/2) += 
new_sound2 [m*N_DELTA_T+n] ; 

) 

else 

printf ("Not  adding  overlapped  samples;  rectangular\n") ; 

/*  loolc  for  overflow  data  points  */ 
k=0; 

loopn (TOT) 

if  (ABS  (new_soundl  [n] )  >32')67 . 0) 

+  =  1; 

printf  ("\nl  found  %d  overflow  data  points  processing  %s\n",  )c,  infile)  ; 

) 

void  add  noise  (float  "signal, float  sdev) 

{ 

int  i,idum; 

/*  Noise  power  is  the  variance  or  sdev*sdev  of  Gaussian  noise  */ 
float  temp; 

/*  set  variance  */ 
i  dum  =  - 1  ; 

loopi (TOT) 

Signal (i)  +=sdev*gasdev (S idum) ; 

) 


main  (int  argc,  char  **argv) 

{ 

short  "temp;  /*  integer  representation  of  original  data  */ 
float  "timeReal;  /♦  windowed  overlapped  signal  data*/ 
float  ‘signal; 

float  *timelmag;  /*  the  imaginary  portion  of  the  data  */ 
float  *Phase; 

float  *spectroReal,  *spectroImag,  ‘spectrogram; 

float  ‘hamWindow; 

float  ‘now_soundl , ‘new_sound2 ; 

float  ‘cosine;  /‘  the  cosine  table  ‘/ 

float  ‘sine;  /‘  the  sine  table  ‘/ 

size_t  s_len; 

int  i; 

MALLOC (timeReal, 2‘TOT, float, "timeReal")  ; 

MALLOC (signal, TOT+N_DELTA_T/2 , float , "signal”)  ; 
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MALLOC (timelmag, N_DELTA_T, float, "timelmag" ) ; 

MALLOC (Phase, TOT, float, "Phase") ; 

MALLOC (spectroReal, 2*TOT, float, "spectroReal") ; 

MALLOC (spectrolmag, 2*T0T, float, "spectrolmag" )  ; 

MALLOC (spectrogram, TOT, float, "spectrogram")  ; 

MALLOC (hamWindow, N_DELTA_T. float, "hamWindow")  ; 

MALLOC (new_soundl , TOT +N_DELTA_T/2, float, "new  soundl") ; 

MALLOC (new_sound2,T0T, float, "new  soundl") ; 

MALLOC (cosine, K_TOTAL_DFS*N_DELTA_T, float, "cosine  table") ; 

MALLOC (sine,K_TOTAL_DFS*N_DELTA_T, float, "sine  table") ; 

my Sound- [Sound  new) ; 

newSound- [Sound  new] ; 

loopi (TOT) ( 

new_soundl [i 1=0; 

new_sound2 [i] =0; 

) 

/*  get  sound  data;  timeReal  has  float  rep.  of  sound  */ 
get_data (Stemp, argv '  ,); 

/•  copy  and  change  integer  sound  samples  to  floats  */ 
loopi (TOT) 

signal[i)=  (f loat ) temp [i ] ; 
print f ( "Accessed  data!\n"); 

/*  add  AWGN  to  signal  */ 
if  (atoi  (argv[3])  !=0) 
add_noise (signal , atof (argv [3 ] ) ) ; 

/*  pad  rest  of  signal  with  zeroes  */ 
for (i=T0T; i<TOT+N_DELTA_T/2; ++i) ( 
signal [ i ] =0; 
new_soundl [i)=0; 

) 

/*  compute  hamming  window  •/ 
get_window (hamWindow, atoi (argv [2  J ) ) ; 

/*  This  multiplies  signal  times  window  with  50%  overlap  •/ 
/•*••*•***•««  change  ham_data  to  array  processor  routine  on  NeXT  ««••/ 
loopi (M_TOTAL_DTS*2) ( 

ham_data (Ssignal [i*N_DELTA_T/21 , hamWindow, 

StimeReal [i*N_DELTA_T] ) ; 

printf ("windowing  pass  no.  %d\n",i); 

) 

loopi (N_DELTA_T) 
timelmag [ i ] -0; 

/*  compute  the  dft  of  the  windowed  data  */ 

DSPAPInit  0 ; 

loopi (M_T0TAI_DTS*2) ( 

f f t_wind_data (StimeReal [i*N_DELTA_T) , 

timelmag, 

t spectroReal [i*N_DELTA_T) , 

Sspectrolmag[i*N_DELTA_T] ) ; 

printf  ("  spectrogram  pass  no.  %d\n",i); 

) 

DSPAPFree  ()  ; 

get_phase (spectroReal, spectrolmag, Phase) ; 
get_spectrogram (spectroReal, spectrolmag, spectrogram) ; 
generate_tables (cosine, sine) ; 

regenerate_sound (spectroReal, spectrolmag, new_soundl , new_sound2, 
cosine, sine) ; 

ham_Newdata (new_soundl , new_sound2, hamWindow, argv[l ) , atoi (argv  [2 ] ) )  ; 
write_output_f iles (temp, new_soundI , Phase, spectrogram, 
hamWindow, argv [1 ) , argv [2 ) ) ; 

there  are  any  overflow  oata  points  truncate  to  +  or-  32767  */ 
loopi (TOT) 

temp [ i ] - (ABS (new_soundl [i )) >32767)  ? 

( (short ) (new_soundl [ i ) /ABS (new_soundl (i))* 32767.0)) : 


125 


( (short) new_soundl [i] )  ; 

/*  get  length  of  input  file  not  counting  null  terminator  */ 
s_len”strlen (argv [1 ] ) ; 

/*  start  name  of  newsoundfile  with  ■new_*  */ 
strcpy (newsoundfile, "new_") ; 
strcat (newsoundfile, argv (2 ] ) ; 

/*  concatenate  input  file  name  to  “new_*  in  newsoundfile ( ]  */ 
strcat (newsoundfile, argv [1 ]) ;  /*  newsound ( ]“new_f ilename . snd  */ 
SNDWriteSoundf ile (newsoundfile, convertStruct) ; 


) 


126 


Appendix  C.  Code  for  Reconstruction  Algorithm  from  Short-Time  Fourier 


Spectrum 


/•«•*•***•»*****•***«*•**  regenerate.m  ••»•••*•*»»«»«*» 

/*  Written  by  Rick  Ricart,  Capt,  USAF  */ 

/*  This  program  reconstructs  signals  from  the  LIN  output  of  gabor  magnitudes  */ 
/*  This  version  is  for  overlapped  coefficients  */ 

/*****•****•*»***»***»•***»***•»**«•*•»*»•»••••***«•*»•*****•*•••••*»**»•»*»»•/ 

♦import  <sound/sound,h> 

♦import  <dsp/arrayproc.h> 

♦import  <math.h> 

♦import  <mach.h> 

♦import  <stdlib.h> 

♦import  <stdio.h> 

♦ import <objc/objc.h> 

♦ import <soundk it /Sound. h> 

♦import<soundkit/soundkit .h> 

♦ import <st ring. h> 

♦import<macros .h> 

♦define  N_DELTA_T  256 
♦define  K_TOTAL~DFS  128 
♦define  M_TOTAL_DTS  78 

♦define  TOT  N_DELTA_T*M_TOTAL_DTS  /*  approximately  one  sec  of  sound  */ 

♦define  TAU  1  /*  exponential  time  constant  */ 

♦define  PI  3.141592654 

♦  define  ABS(x)  ( (float) sqrt (( (double) x) *( (double) x)  ) ) 


/*  Global  Variables  */ 
id  my Sound; 

SNDSoundStruct  *soundStruct, *convertStruct; 
char  newsoundfile (801 , newsignalf ile (801  ; 


/*  Procedures  and  Functions  */ 
void  get_soundstruct () ; 
void  get_data(); 
void  f ind_energy 0 ; 
void  get_real_imag()  ; 
void  write_output_files 0  ; 
void  generate_tables 0 ; 
void  regenerate_sound 0 ; 

void  get_window 0 ; 
void  ham_Newdata () ; 
void  output_newSpect 0 ; 

/‘These  are  needed  for  splines  */ 
void  spline ( ) ; 
void  splint  0  ; 
float  ‘vector  (); 

main  (int  argc,  char  “argv) 

( 

FILE  ‘fl; 

float  ‘new_soundl , *new_sound2; 

short  ‘temp;  /‘  integer  representation  of  reconstructed  signal  ‘/ 
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float  *spectroReal,  ‘spectrolraag,  ‘spectrogram; 
float  *new_spectrograra, ‘Phase; 
float  ‘hamWindow; 

float  ‘cosine;  /‘  the  cosine  table  ‘/ 
float  ‘sine;  /‘  the  sine  table  ‘/ 
int  i,size; 
size_t  s_len; 

MALLOC (spectroReal, 2‘TOT, float, "spectroReal*) ; 

MALLOC (spectrolmag, 2‘TOT, float, "spectrolmag") ; 

MALLOC (spectrogram, TOT, float, "spectrogram") ; 

MALLOC (new_spectrogram, TOT, float, "new  spectrogram") ; 

MALLOC (Phase, TOT, float, "Phase") ; 

MALLOC (hamWindow, N_DELTA_T, float , "hamWindow" )  ; 

MALLOC (new_3oundl,T0T+N_DELTA_T/2, float, "new  soundl") ; 

MALLOC (new_3ound2, TOT, float, "new  soundl ") ; 

MALLOC (cosine, K_TOTAL_DFS‘N_DELTA_T, float,  "cosine  table") ; 

MALLOC (sine,K_TOTAL_DFS‘N_DELTA_T, float, "sine  table") ; 
loop! (TOT) ( 
new_soundl  [  i  ]  *=0  ; 
new_sound2 [i]“0; 

) 

/*  pad  rest  of  signal  with  zeroes  ‘/ 
for (i=TOT; i<T0T+N_DELTA_T/2; ++i) 
new_soundl (i]=0; 

get_data (spectrogram, new_spectrogram, Phase, argv  fl ] , argv [2] , argv [3 ) ) ; 

/*  f ind_energy (new_spectrogram) ;‘/ 

get_real_imag (spectrogram, new_spectrogram, Phase, spectroReal, spectrolmag) ; 
generate_tables (cosine, sine) ; 

regenerate_sound (spectroReal, spectrolmag, new_soundl , new_sound2, 
cosine, sine) ; 

/*  compute  hamming  window  ‘/ 
get_window (hamWindow) ; 

hara_Newdata (new_soundl, new_sound2 , hamWindow, argv(l ] ) ; 
my Sound= (Sound  new] ; 
get_soundstruct (stemp) ; 
size“SOundStruct->dataSize; 

/*  clear  old  sound  ‘/ 
loop! (size) 
temp [i] -0; 
loopi (TOT) 

temp [i ] “ (ABS (new_soundl [ i ]) >32767) ? 

((short) (new_soundl [i] / 

ABS (new_soundl (i] ) ‘32767 .0)  )  : 

( (short) new_soundl [i] ) ; 

/*  loopi (TOT) 

temp ( i ] - (short ) (cos (PI* (double) (2‘20‘i) / (double) N_DELTA_T) ‘10000) ; */ 

/‘  get  length  of  input  file  not  counting  null  terminator  ‘/ 
s_len“strlen (argv(3) ) ; 

/‘  start  name  of  newsoundfile  and  newsignalfile  with  "short_"  ‘/ 
strcpy (newsoundfile, "short_") ; 
strcpy (newsignalfile, "short_") ; 

/‘  concatenate  name  of  original  sound  file  minus  "_Phase.dat"  identifier  ‘/ 
strncat (newsoundfile, argv(3) , (s_len-10) ) ; 

St  meat  (newsignalfile,  argv  [3] ,  (s_len-10)  ) ; 

St r cat (newsoundfile, " . snd") ; 
streat (newsignalfile, " . dat") ; 

printf("The  new  sound  file  is  %s\n", newsoundfile) ; 

SNDWriteSoundf ile (newsoundfile, convertStruct) ; 

/*  open  and  write  signal  data  to  newsignalfile  ‘/ 
if  ( (fl  “  fopen (newsignalfile, "w") )  -=  NULL) { 

printf("\n‘“  Cannot  can't  create  %s  newsignalfile) ; 

exit  (0)  ; 

) 

loopi (TOT) 
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£printf(fl,"%f\n", new_soundl [i  )  )  ; 
fclose (fl) ; 


void  get_soundstruct (short  **temp) 

{ 

int  error, data_3i2e, i; 

BOOL  edit; 

/* 

inPtr  -  (short  *)  ((char  •)  inputSound  +  inputSound->dataLocation) ; 
outPtr  ”  (short  •)  ((char  *)  ‘outputSound  +  (‘outputSound) ->dataLocation) 

*/ 

error  =  [mySound  readSoundfile ; "blank . snd" ) ; 

/•  initialize  mySound  to  infile's  mySound  object  */ 
soundStruct  “[mySound  soundStruct ) ; 
data_size  =  soundStruct->dataSize; 

SNDAlloc (4convertStruct, data_size, SND_F0RMAT_LINEAR_16,SND_RATE_C0DEC, 
soundStruct->channelCount, "  ")  ; 

SNDConvertSound(soundStruct, iconvertStruct) ; 

*temp=  (short  * ) (convertStruct+convertStruct->dataLocation) ; 
printf  ("Leaving  get_soundstruct  ()  \n")  ; 

) 


void  get_data (float  "gaborMag, float  *new_gaborMag, float  'gaborPhase, 
char  "infilel, char  *infile2, char  *infile3) 

I 

int  i; 

FILE  *fl, *f2, »f3; 

printf ("the  three  files  are  %s,  %s,  and  %s\n", infilel, infile2, inf ile3)  , 

if((fl  “  fopen (inf ilel , "r" ) ) “=NULL) { 

printf ("\n»**  I  can't  read  %s  *•*", inf ilel ) ; 

exit  (1)  ; 

} 

if  (  (f2  “  fopen (infile2, "r") ) “=NULL)  { 
printf ("\n***  I  can't  read  %s  *  *  * " , inf ile2) ; 
exit  (1)  ; 

) 

if((f3  =  fopen (infile3, "r") ) ““NULL) { 

pr int f (”\n* *  *  I  can't  read  %s  * • * " , inf ile3) ; 

exit  (1 )  ; 

) 

f or (i“0;  (f scanf (f 1 , "%f " , SgaborMag [ i 1 )  !=EOF) ; ++i) ;  /*  reading  gaborMag  */ 
for  (i“0;  (f scanf (f 2, "%f " , Snew_gaborMag [i ) )  ! =EOF) ; ++i) ; 
for (i“0; (fscanf (f3, "%f", igaborPhase [i] ) !“EOF) ;++i) ; 

fclose  (f 1) ; 
fclose (f2) ; 
fclose  (f3) ; 

/*  printf ("Leaving  get_data\n") ; •/ 

} 

void  find_energy (float  *new_gaborMag) 

( 

int  k,m,n; 

float  max, energy , threshold; 

/*  clear  variable  max  */ 
max“0 . 0; 

/*  set  average  threshold  energy  */ 
threshold“0 . 05; 

/*  determine  the  max  value  in  current  time  slice  */ 
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loopm (M_T0TAL_DTS*2) 
loopk (K_TOTAL_DFS) 

max“  {new_gaborMag  [m*K_TOTAL_DPS+k]  >niax)  ? 

(new_gaborMag [m*K_TOTAL_DFS+k] ) :max; 

/*  printf ("The  max  value  is  %7 . 4g\n",max) ; •/ 

/*  divide  by  max  value  of  new_gaborMag [ ]  */ 
loopm (M_T0TAL_DTS*2) 
loopk (K_TOTAL_DFS) 

new_gaborMag[m*K_TOTAL_DFS+k)  /=max; 

/*  find  the  energy  in  each  -ISmsec.  window  •/ 

loopm(M_T0TAL_DTS*4) ( 

n=0; 

energy=0 . 0; 

loopk (K_T0TAL_DFS/2) 

if (new_gaborMag[m»K_T0TAL_DFS/2+k] !=0)  { 

+  +  n; 

energy  +*  new_gaborMag(m*K_T0TAL_DFS/2+kl  ; 

) 

energy  /” (float) n; 

/*  printf  ("window  %d--energy  .  4g--survivors  %d\n",m,  energy,  n)  ;  */ 

/*  if  the  energy  is  less  than  the  threshold  zero  entire  window  */ 

if (energy<threshold) 

loopk (K_TOTAL_DFS/2) 

new_gaborMag (m*K_TOTAL_DFS/2+k) =0; 

) 

/*  multiply  by  max  value  of  new_gaborMag [ )  */ 
loopm  (M_T0TAI._DTS*2) 
loopk (K_TOTAL_DFS) 

new_gaborMag  (m*K_TOTAI,_DFS+kl  *=max; 

) 

void  get_real_imag  (float  *gaborMag, float  *new_gaborMag,  float  "gaborPhase, 
float  "gaborReal,  float  *gaborImag) 

{ 

i  nt  m ,  k ,  i ; 
float  "temp, *org; 

/*  the  following  are  for  spline,  splint  calls  */ 
float  *x_set , *y_set , *y2 , new_y; 
float  ans; 


MALLOC (temp, TOT, float, "temp  spect") ; 

MALLOC (org, TOT, float, "temp  spect" ) ; 

/*  Use  new_gaborMag  (the  truncated  gabor  coefficients)  as  an  index  */ 

/*  Do  this  by  zeroing  out  gaborMagC]  components  which  correspond  to  */ 

/*  zero  components  of  new_gaborMag [ ]  */ 

memcpy (org, gaborMag, T0T*sizeof (float) )  ; 

loopm (M_T0TAL_DTS*2) 

loopk (K_TOTAL_DFS) 

if ( (new_gaborMag[m*K_TOTAL_DFS+k)-=0)  I  I  (k<0) ) 

/*  if(k>100)*/ 

gaborMag [m*K_TOTAL_DFS+kJ -O; 

memcpy (temp, gaborMag, TOT*sizeof (float ) ) ; 

/*  Fill  in  freq  and  phase  components  with  spline  interpolation  ♦/ 

/*  x_set«vector (1 , 128) ; 
y_set«vector (1, 128) ; 
y2»vector  (1, 128) ; 
loopm (M_TOTAL_DTS*2) { 
i-0; 

loopk (K_TOTAL_DFS) 

if (gaborMag [m*K_TOTAL_DFS+kl  !»0I  lk-=0) { 

+  +  i  ; 

x_set [ i 1 = (float) k; 

y_set [i] -gaborMag [m*K_TOTAL_DFS+kl  ; 

) 
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,y_set  [  i  +  1 )  *127 . 0; 
y_set [i+1 I “0 . 0; 

spline (x_set , y_set , i,0.0,0.0,y2); 
loopk {K_TOTAL_DFS) [ 

splint  (x_set,  y_set,  y2,  i,  (float)  )c,  tans)  ; 
gaborMag  [m*K_TOTAL_DFS+)cl -ans; 

) 

if  (m*”19) 

looplt  (K_TOTAL_DFS)  ( 

printf  ("%5.3g  %5.3g  %5 . 3g\n",  temp [m*K_TOTAL_DFS+)c) , 

gaborMag  [m*K_TOTAL_DFS+)c  ] , 
org  [m*K_TOTAL_DFS+)t) )  ; 

) 

)*/ 

/*  Get  real  and  imaginary  values  from  magnitude  and  p)iase  */ 
loopm (M_T0TAL_DTS*2) 

loop)c  (K_TOTAL_DFS)  (  /•  if  Gaussian  window  *4  •/ 

gaborReal  [m*N_DELTA_T+)c]  =4  .  O+gaborMag  [m*K_TOTAL_DFS+)c]  * 
(float)  cos  (  (double)  gaborPhase  (m*K_TOTAL_DFS+)c) )  ; 
gaborlmag  [m*N_DELTA_T+)il  -4  .  O'gaborMag  (m*K_TOTAL_DFS+)c )  * 
(float)  sin  (  (double)  gaborPliase  lm‘K_TOTAL_DFS+)cl )  ; 

) 

/*  printf ("Leaving  get_real_imag ( ) \n” ) ; * / 

) 

void  generate_tables (float  “cosine, float  “sine) 

( 

i  n  t  )« ,  m ,  n  ; 
double  temp; 

/*  create  sine  and  cosine  tables  */ 
loop)l  (K_TOTAL_DFS) 
loopn (N_DELTA_T) ( 

cosine  [)t*N_DELTA_T+nl  ■  (float)  (cos(PI*  (double)  (2*)i*n) 

/ (double) N~DELTA~T) ) ; 

sine()c*N_DELTA_T  +  nl-(float)  (sin  (PI*  (double)  (2*)c*n) 

/ (double) N_DELTA_T) ) ; 

) 

/*  printf ( "Leaving  generate_tables 0 \n") ; */ 

) 

void  regenerate_sound (float  ‘Real,  float  *Imag, 
float  *new_soundl,  float  *new_sound2, 
float  ‘cosine,  float  *sine) 

( 

int  i  ,  It ,  m,  n; 

Icopm (M_TOTAL_DTS) ( 

/*  printf  ("regenerating  time  slice  %d  ...\n",m);  */ 
loopk (K_TOTAL_DFS) 
loopi (N_DELTA_T) 
new_soundl [m*N_DELTA_T+i ] ♦“ 

(Real [2*m*N_DELTA_T+kl * 
cosine[k*N_DELTA_T+il- 
Imag[2*m*N_DELTA_T+kl * 
sine[k*N_DELTA_T+i)) ; 

) 

loopm (M_TOTAL_DTS) { 

/*  printf ("regenerating  time  slice  td  ...\n",m);  */ 
loopk (K_TOTAL_DFS) 
loopi (N_DELTA_T) 
new_sound2 [m*N_DELTA_T+i ) +» 

(Real [ ( (2‘m) .1) *N_DELTA_T+ki • 
cosine[k*N_DELTA_T+il - 
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Imag[ ( (2*m)+l) *N_DELTA_T+kl  * 
sine [k*N_DELTA_T+i] )  ; 

) 

/*  printf (“Leaving  generate_sound () \n”)  ;  */ 

) 

void  get_window (float  *hamWindow) 

{ 

int  n; 

double  time; 

/*  create  Hamming  window  */ 

loopn (N_DELTA_T) 

hamWindow (n) - (float ) (0.54-0.46* 

cos ( (double) PI* (double)  (2*n) / (double)  (N_DELTA_T-1) ) )  ; 
/*  printf ( “Leaving  get_window ( ) \n“) ; * / 

) 


/*  change  this  to  a  dsp  routine  */ 

void  ham_Newdata ( float  *new_soundl , float  *new_sound2, 
float  *hamWindow, char  *infile) 

( 

int  ){,m,n; 

/•  multiply  every  N_DELTA_T  points  of  data  by  window  */ 
loopm (M_TOTAL_DTS) 
loopn (N_DELTA_T) 

new_soundl (m*N_DELTA_T+nl *“hamWindow [n] ; 

/•  mulitply  every  N_DELTA_T  points  by  window  */ 
loopm (M_TOTAL_DTS) 
loopn (n2dELTA_T) 

new_sound2 [m*N_DELTA_T+n ) *-hamWindow ( n)  ; 

/*  add  the  overlapped  reconstructed  data  points  */ 
loopm (M_TOTAL_DTS) 
loopn (N_DELTA_T) 

new_soundl (m*N_DELTA_T+n+N_DELTA_T/2 ) *= 
new_sOund2 [m*N_DELTA_T+nl  ; 

/*  loo)c  for  overflow  data  points  */ 

)i-0; 

loopn (TOT) 

I f (ABS (new_soundl [nj ) >32167 . 0) 
k  +>1; 

printf ("\nl  found  %d  overflow  data  points  processing  %s\n”, k, infile) 

) 
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Appendix  D.  Code  for  UN  Compression  ofW-H  Spectra 


COMPETE. C 

/*  iniriLLc."  by  Pick  Ricart,  Capt,  USAF  */ 

/*  This  program  computes  LIN  competition  of  W-H  frequency  spaces  */ 

/*  using  the  ERB  and  CB  criteria.  Call  program  as  follow:  */ 

/*  compete  <filename>  <0  or  1>  <threshold>  */ 

/*  where  0  signifies  ERB  and  1  CB.  */ 

/*  Threshold  is  any  value,  v  such  that  0<v<  1,  (e.g.,  .009)  */ 

/*  result  filenames  are  automatically  generated  */ 

linclude  <math.h> 
linclude  <stdio.h> 

♦include  <macros.h> 

♦define  FO  0.25 
♦define  7AU  0.1 

♦  define  EPS  1 . 0 
♦define  MAX  1 . 0 
♦define  MIN  1 . 0 
♦define  COMPGAIN  10.0 
♦define  K_TOTAL_DFS  128 
♦define  M_TOTAL_DTS  70 
♦define  SURVIVORS  50 

♦  define  SQUARE  (y.)  ((x)*(x)) 

♦define  SIGMOID (x)  (SQUARE (x) )/ (SQUARE (x) +F0) 

FILE  out fi le (SO) , result ( 80 ! ; 

typedef  struct ( 
float  input; 
float  activity; 
float  output; 
int  ‘competitors; 
int  num_compet itors ; 

) nodes ; 

float  squash (float  val,  float  oldmax,  float  oldmin,  float  newmax,  float  newmin) 
{ 

float  answer; 

answer*'  (  { (val-oldmin)  /  (oldmax-oldmin)  )  *  (newmax- newmin) )  +newmin; 
return  answer; 

) 

void  write_output  (float  ‘gaborMag,  char  ‘infile,  ir.t  size,  int  bw,  char  ‘thresh) 

( 

f ILE  ‘ £1 , *  fopen  ( ) ; 
int  )c,m,  n; 

print f ("Starting  to  output  filesNn"); 

/‘  start  name  of  outfile  with  "ERB  or  CB"  ‘/ 
if (!bw) { 

strcpy (outfile, "ERB") ; 

/‘  outfileO’,-' \0'  ;•/ 

) 

else  ( 

strcpy (outfile, "CB") , 


133 


/•  outfile[2]='\0';*/ 

) 

strcat (outfile, thresh) ; 

/*  newsignalfile“new_infile  */ 
strcat (outfile, infile) ; 

/*  open  and  write  signal  data  to  signalfile  */ 
if  ((fl  »  fopen (outfile. "w") )  —  NULL) ( 

printf  ("\n* *  *  Cannot  can't  create  %3  **•", outfile) ; 
exit  (0)  ; 

) 

loopn (sire) 

fprintf  (f  1 ,  '•%f  \n",  gaborMag  [nl )  ; 
fclos  5 (fl) ; 

} 

void  write_result (float  *gaborMag,  char  *infile) 

( 

FILE  *fl,  *fopen()  ; 
int  )c,m,n; 

/*  start  name  of  outfile  with  "new_"  */ 
strcpy  (result,  ■'result_'')  ; 

/*  newsignalf ile=new_inf ile  */ 
strcat (result, infile) ; 

if  ((fl  =  fopen  (result, "w") )  NULL) ( 

printf ("\n** *  Cannot  can't  create  %s  ***”, result) ; 
exit  (0) ; 

) 

/*  loopm (M_TOTAL_DTS)  for  nonovelapped  coefficients  loop (M_T0TAL_DTS*2 ) for 
overlapped  coefficients  */ 
loopm (M_TOTAL_DTS*2) ( 
fprintf (fl, "TIME  %d:  ",m); 

loop)^  (K_TOTAL_DFS) 
if  (gaborMag(m*K_TOTAL_DFS+itl  1=0) 

fprintf  (fl,  "ld,%5.2g  ",  )t,  gaborMag  (ra*K_TOTAL_DFS+)c]  )  ; 

fprintf  (fl, "\n") ; 

) 

f close  <f 1 ) ; 

) 


void  compute_activity (nodes  *n_list, nodes  *node) 

{ 

float  a,exc,inh; 
int  i; 

a=node->activity ; 
inh=0; 

exc*node->output+node->input ; 

loopi  (node->nuin_competitors) 

inh+»n_list (node->competitors [i] ) .output; 

node->activity=a+TAU*EPS* (-a+ (MAX-a) *exc-  (MIN+a) *inh*COMPGAIN) ; 
) 

void  compute_output (nodes  'node,  float  thresh) 

{ 

float  a; 

a«node->activity ; 

/•  if(a<0) 
node ->out put -0; 
else 

if  (a>-04Sa<-l) 
node ->out put ”a; 
else 
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node->output=l ; * / 

node->output“ (a-thre3h<0) ?0 : (SIGMOID (a-thresh) ) ; 

) 

void  initiali2e_competitors (nodes  *node,int  index, int  size,  int  bw) 

( 

int  i ,  j, num, count ; 
char  in_string [8] ; 

/*  change  these  appropriately  thru  freq  vals  MUST  BE  AN  EVEN  NUMBER  */ 
/*  Make  this  a  SWITCH  statement  */ 
if(!bw)(  /*  this  does  ERB  competition  */ 
if (index  ==  0) 

printf ("Doing  ERB  competitionXn") ; 
if(index<30)  /*  for  ERB  <40,  CB<30  */ 
num=2;  /*  for  ERB  2,  CB  4  */ 
else 

if(index<58)  /*  for  ERB  58,  CB  41  */ 

num=4;  /*  for  ERB  4,  CB  6  */ 

else 

if(index<75)  /*  for  ERB  75,  CB  60  */ 

num“6;  /*  for  ERB  6,  CB  8  */ 

else 

if(index<92)  /•  for  ERB  91,  CB  72  •/ 
num=8;  /*  for  ERB  8,  CB  10  */ 
else 

if (index<106)  /•  for  ERB  106,  CB  84  */ 
num=10;  /•  for  ERB  10,  CB  12  •/ 
else 

if (index<121)  /*  for  ERB  121,  CB  94  */ 
num*12;  /*  for  ERB  12,  CB  14  */ 
else 

nuir.=  14;  /•  for  ERB  14,  CB  16  */ 

) 

else(  /*  this  does  CB  competition  •/ 
if (index  0) 

printf ( "Doing  CB  competition  ... \n") ; 
if(index<30)  /*  for  ERB  <40,  CB<30  */ 
num=4;  /♦  for  ERB  2,  CB  4  */ 
else 

if(index<47)  /*  for  ERB  58,  CB  47  •/ 

num-6;  /*  for  ERB  4,  CB  6  */ 

else 

if(index<60)  /*  for  ERB  75,  CB  60  */ 

num=8;  /*  for  ERB  6,  CB  8  */ 

else 

if(index<72)  /*  for  ERB  91,  CB  12  */ 
num-10;  /*  for  ERB  8,  CB  10  •/ 
else 

if(index<84)  /*  for  ERB  106,  CB  84  */ 
num-12;  /*  for  ERB  10,  CB  12  */ 
else 

if(index<94)  /‘  for  ERB  121,  CB  94  ♦/ 
num=14;  /*  for  ERB  12,  CB  14  */ 
else 

num»16;  /*  for  ERB  14,  CB  16  */ 

) 

node->num_compet itors^nura; 

MALLOC (node->competitors, num, int, "allocating  num_competitors") ; 

/*  competitors  of  index  less  than  index  of  current  node  */ 

count=l ; 

loopi (num/2) 

if (index-i'l>”0) 

node->competitors [i (“index-i-l ; 

else  ( 
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if (index-l-i+count«“index) 
count  +“1; 

node->competitors [i] “index-l-i+count ; 
count  +=2; 

) 

/*  competitors  of  index  greater  than  index  of  current  node  */ 

ccunt=l ; 

loopi (num/2) 

if (index+l+i<*site-l) 

node->corapetitors [i+num/2] «index+l+i; 

else  ( 

if (index+l+i-count““index) 
count  +=1; 

node->competitors [i+num/2] -index+l+i-count; 
count  +=2; 

) 

/t*ttt**tt»**»*»**  print  statements  ***•»*••*»*»»***/ 

/*  printf("the  current  node  is  %d  and  the  competitors  are:  ", index); 
loopi (num) 

printf ("%d  ",  node->competitors (i) ) ; 
printf  ("\n")  ; 

printf ("Hit  ENTER  to  continue"); 
gets (in_string) ; * / 

/*********«********  ****************y 

) 


main(int  argc,  char  **argv) 

( 

float  "array, "outarray, max, min, temp; 
int  i,j,)c,l,size,  time,  count,  index,  tot  ; 
size_t  s_len; 
nodes  *n_list; 
char  in_string [8 1 ; 

FILE  »fin,»fout; 

/*  open  input  file  */ 

if((fin  =  fopen (argv ( 1 ] , "r") ) ==NULL) ( 
printf  {"\n***  I  can't  read  %s  *  *  * " , argv [ 1 ] ) ; 
exit  (1)  ; 

) 

/*  for  nonoverlapped  gabor  coefficients  size=K_TOTAL_DFS*M_TOTAL_DTS, 
for  overlapped  gabor  coefficients  size=K_T0TAL_DFS*M_T0TAL_DTS*2  */ 

/***«****•**********************************«*************«***«***«******«*/ 

printf ("\n  starting  %s\n", argv [ 1 ] ) ; 

printf ("Bandwidth  %s,  and  threshold  %s\n",  argv(2),  argv[3)); 
s i ze”K_TOTAL_DFS  *M_TOTAL_DTS  *  2 ; 

/*  allocate  memory  */ 

MALLOC (array, size, float, "allocating  array") ; 

MALLOC (outarray, size, float, "allocating  outarray") ; 

MALLOC (n_list, K_TOTAL_DFS, nodes, "allocating  n_list") ; 
tot  -  0; 

/*  copy  values  of  input  file  into  array  */ 
for(i«0; (fscanf(fin,"%f", (array [ i 1 ) ! “EOF) ; ++i) ; 

/*  loopi (size) 

array (i ] ” (array ( i ) ! “0) ? ( (float) loglO ( (double) array (i) ) ) : 0;  */ 

/*  determines  which  nodes  compete  */ 
loopi (K_TOTAL_DFS) 

initialize_competitors (4n_list [i] , i,K_TOTAL_DFS,atoi (argv[21 ) ) ; 

/*  initialize  total  number  of  time  increments  for  solving  shunting  equations  */ 
time=100; 

/*  This  is  the  outer  loop  for  each  time  slice.  Within  this  loop,  the  max 
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magnitude  of  the  Gabor  coefficients  per  time  slice  is  determined  and 
all  frequency  components  in  that  time  slice  are  squashed  between  1  and  -1. 

These  values  are  then  passed  to  the  LIN  and  the  activity  and  outputs 
are  determined  for  each  node  in  the  LIN.  The  result  is  then  saved  in 
outarrayO  */ 

/A****************************************************************************/ 

loopk (M_T0TAL_DTS*2) ( 

/*  clear  variable  max  *! 
max=0 . 0; 

/*  determine  the  max  value  in  current  time  slice  */ 
loopi (K_TOTAL_DFS) 
if (array [k*K_TOTAL_DFS+i ] >max) 
max=array [k*K_TOTAL_DFS+i]  ; 

printf ("\nMax  element  is  %f  at  time  %d - Starting  squash ... \n", max,  k)  ; 

/♦  squash  all  frequency  components  of  current  time  slice  between  1  and  -1  */ 
loopi (K_TOTAL_DFS) 

array (k*K_TOTAL_DFS+i ] “squash (array (k*K_TOTAL_DFS+i 1 .max, 0.0, 1.0, 0.0) ; 

/*  clear  activity  and  output  variables  of  all  nodes  */ 
loopi (K_TOTAL_DFS) { 
n_list [ i 1 . activity=0 . 0; 
n_list [i 1 . output“0 . 0; 

) 

/*  initialize  input  to  all  nodes  */ 
loopi (K_TOTAL_DFS) 

n_list [ i ]  . input “array [k*K_TOTAL_DFS+i ]  ; 

/*  compute  the  activity  and  output  of  all  nodes  in  current  time  slice  */ 
loopj (time) { 
loopi (K_TOTAL_DFS) 

compute_activity (n_list, Sn_list [i] ) ; 
loopi (K_TOTAL_DFS) 

compute_output (tn_list [ i ] , atof (argv [ 3 ) ) ) ; 

) 

/*  copy  the  re.su  It  to  outarray[l  */ 
loopi (K_TOTAL_DFS) 

out array  tk*K_TOTAL_DFS+i 1 =n_list [ i ) . output  ; 

/**•♦«  print  statements  ••*•••••*•»/ 

/•  loopi (K_TOTAL_DFS) ( 

print f ( "Node :  %d  Array:  %7.4f  Input;  %7.4f 
.i^ctivity:  %7.4f  Output:  %7.4f\n", 
i  ,  array [k*K_TOTAL_DFS+i ] , n_list (i 1 . input, 
n_list [i ] . activity, n_list [ i 1 . output ) ; 
printf ("Hit  ENTER  to  continue"); 
gets (in_string) ; 

)*/ 

/•••••••*i>«*«****«*««***<*****<*<*t/ 

count  =0; 

/*  unsquash  result  */ 
loopi (K_TOTAL_DFS) { 

temp* squash (outarray (k*K_TOTAL_DFS+i 1 ,1.0,0. 0,max,  0)  ; 
if  (temp<10 . 0) 

outarray (k*K_TOTAL_DFS+i ) =0; 
else  ( 

outarray [k*K_TOTAL_DFS+i ] “temp; 
count  +=1; 

) 

) 

printf ("In  time  slice  %d  there  were  %d  survivors\n",k, count) ; 

if (count>SURVIVORS) ( 

loopj ( (count  —SURVIVORS) ) { 

min*outarray [k*K_TOTAL_DFS]  ; 

index“0; 

printf ("  Cutting  %d  survivors  to  %d\n", count, SURVIVORS)  ; 
loopi (K_TOTAL_DFS) ( 
if (outarray [k*K_TOTAL_DFS+i 1 <min) { 
min”Outarray [k*K_TOTAL_DFS+i )  ; 
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index“i ; 


) 

) 

out  array  tk*K_TOTAL_DFS+index]  -^O; 

} 

) 

tot  +“Count; 

printf ("Total  now  is  %d\n’,tot); 

) 

/...*.>.»»>.*..*.*****•***.*  gnci  of  foop  *•*******•»*****•***«*“.•*«**/ 

/*******  print  statements  *****»***/ 

/*  printf ("ARRAY: \n") ; 
loopi  () 

printf ("%3 . 2f  ",  array [i ]) ; 
printf ("\n")  ; 
printf ("OUTPUT:\n") ; 
loopi (K_TOTAL_DFS) 

printf ("%3.2f  ", n_list [ i ] . output ) ; 
printf ("Nn") ; 
printf ("OUTARRAY; \n") ; 
loopi (size) 

pr int f ( " % 3 . ? f  ",  out array ( i 1 )  ; * / 

y**************«*»*****«***k***ttft**^ 

write_output (out array, argv[l) ,size,atoi (argv [2]) , argv [ 3) ) ; 
printf ("The  average  no.  of  survivors  per  window  is  %5.3g,  and  RC=%5.3g\n", 
(float)tot/  (float)  (2»M_T0TAI,_DTS)  , 

(float) (2*M_T0TAL_DTS*K_T0TAL_DFS) / (float)tot) ; 

/*  write_result (outarray, argv [ 1 ) ) ;  */ 
f close (fin) ; 

) 
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Appendix  E.  Code  for  Haar  Wavelet  Decomposition,  Compression,  and 

Reconstruction 


wavelet. m 

/*  Written  by  Rick  Ricart,  Capt,  OSAF  */ 

/*  This  program  computes  the  Haar  wavelet  decomposition, compression,  */ 

/*  and  reconstruction  of  a  time  signal  */ 

/•*•*********•*****•**•*•«••••*•«. «t«*««<«««»«***<i***«*«***********«**ii*««/ 


limport  <sound/sound.h> 

# import  <math.h> 

limport  <dsp/arrayproc . h> 

limport  <mach.h> 

limport  <stdlib.h> 

limport  <stdio.h> 

limport  <ob jc/ob jc . h> 

limport  <soundkit/Sound . h> 

limport  <soundkit/soundkit .h> 

limport  <string.h> 

limport  <macros.h> 

Idefine  N_DELTA_T  256 
•define  K_TOTAL_DFS  128 
Idefine  M_TOTAL_DTS  31 

Idefine  TOT  M_TOTAL_DTS*N_DELTA_T  /*  approximately  one  sec  of  sound  */ 

Idefine  PI  3.141592654 

Idefine  LEVELS  8 

Idefine  MAX  1 

Idefine  MIN  1 

Idefine  COMPGAIN  10.0 

Idefine  FO  0.25 

Idefine  TAU  0 . 1 

Idefine  EPS  1 . 0 

Idefine  ABS(x)  sqrt  (pow  (x,  2 . 0)  ) 

Idefine  ABS2  (x)  (  (float)  sqrt  ((  (double)  ;-;)*(  (double)  x)  )  ) 

Idefine  SQUARE (x)  ( (x) *  (x) ) 

Idefine  SIGMOID (x)  (SQUARE (x) )/ (SQUARE (x) +F0) 

/*  Global  Variables  */ 
id  mySound,  newSound; 

SNDSoundStruct  ‘soundStruct,  * convert Struct,  ‘shortStruct; 

/*  the  filenames  */ 

char  newsoundfile (80] , shortsoundf ile [80] ; 

void  compute_trans ( ) ; 
float  compute_iproduct  0 ; 

typedef  struct  grid_element  { 
struct  grid_element  *r_ptr; 
stiuct  grid_element  *l_ptr; 
struct  grid_element  ♦rleg_ptr; 
struct  grid_element  *lieg_ptr; 
struct  grid_element  •up_ptr; 
float  iproduct_va 1 , 
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float  input; 
float  activity; 
float  output; 
int  type; 

}  grid_element; 

typedef  struct  { 
grid_element  ‘head,  ‘current; 

(  list; 

typedef  strict  { 
float  ‘wavelet; 

1  wavelets; 

/.****«».»*.****.***»  get_data()  ‘“““*»‘“»»«******»*«“***“**‘“/ 

/‘  This  procedure  opens  and  reads  the  signal  values  of  a  sound  file  ‘/ 

/•***«*<>***********«*****«*«**ii**««********«********«**«*****<**«*«**^ 

void  get_data (short  “temp, short  “temp2,chat  ‘infile) 

{ 

int  error,  data_size,i; 

BOOL  edit; 

/* 

inPtr  =  (short  ‘)  ((char  *)  inputSound  +  inputSound->dataLocation) ; 
outPtr  =  (short  ‘)  ((char  ‘)  ‘outputSound  +  (‘outputSound) ->dataLocation) 

*/ 

error  =  [mySound  readSoundfile : infile) ; 

/*  initialize  mySound  to  infile's  mySound  object  •/ 
soundStruct  = [mySound  soundStruct 1 ; 

[mySound  isEditable); 

data_size  •  soundStruct->dataSize; 

SNDAlloc (iconvertStruct , data_size, SND_F0RMAT_LINEAR_1 6, SND_RATE_CODEC, 
soundStruct->channelCount, "") ; 

SNDAlloc (Sshort3truct,data_size, SND_F0RMAT_LINEAR_16, SND_RATE_CODEC, 
soundStruct->channelCount, "") ; 

SNDConvertSound (soundStruct, sconvertStruct) ; 

SNDCon vert Sound (soundStruct , S short Struct ) ; 

*temp=  (short  ‘ )  (convertStruct+convertStruct->dataLocation)  ; 

*temp2=  (short  ‘ ) (shortStruct+shortStruct->dataLocation) ; 

) 

list  ‘majce  grid() 

{ 

list  ‘temp; 

MALLOC (temp, 1, list, "Allocating  list")  ; 

temp->head=NULL; 

temp->current“NULL; 

/*  printf("Made  ma)ie_grid  successfullyNn")  ;‘/ 
return  temp; 

) 


void  al locate_node (list  ‘1,  grid_eleraent  ‘node) 
{ 

if  (l->head— NULL)  { 
l->head=node; 
l->current“node; 
node->l_ptr=node->r_ptr”NULL; 

) 

else  ( 

node->l_ptr”l->current; 

l->current->r_ptr-node; 

1 ->cur rent-node ; 

) 

node->rleg_ptr«node->lleg_ptr=NULL; 

node->up_ptr-NULL; 
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) 


void  allocate_grid_triad (list  *1,  grid_element  ‘element) 

{ 

gr id_element  ‘tempi , ‘temp2 ; 

/*  printf ("\nAllocating  a  triad\n");‘/ 

/*  allocate  both  legs  ‘/ 

MALLOC (tempi, 1 , grid_element, "allocating  lleg") ; 

MALLOC (temp2, 1 , grid_element, "allocating  rieg") ; 

templ->type=0; 

temp2->type=l; 

element->lleg_ptr»'templ ; 

element->rleg_ptr»temp2; 

templ->r_jptr=temp2; 

temp2->l_ptr«templ ; 

templ->up_pt r=temp2->up_ptr=element; 

if  (element->l_ptr“=NULL I  I e lement->l jptr->rleg__ptr“NULL) 

templ->l_ptr=NULL; 

else  { 

teir,pl->l_ptr=element->l_ptr->rleg_ptr; 
element->l_ptr->rleg_ptr->r_ptr=templ ; 

) 

t  emp2  -  >  r_ptr=NULI,; 

templ->rleg_ptr“temp2->rleg_ptr=templ->lleg_ptr“temp2->lleg_ptr”NULL; 

l->current=element; 

/‘  printf ( "allocated  grid  triad  successfully\n") ; 

) 


void  inner_loop (int  level,  list  ‘1,  grid_element  ‘element,  wavelets  ‘wl, 
float  ‘signal, float  ‘new_signal, 
int  m,  int  points) 

( 

int  elements; 

elements= (int) pow  (2 .0,  (double)  (LEVELS-level) ) ; 

if  (level==0) 

return; 

else 

if (element->r_ptr==NULL) { 
points  -=elements; 

/‘  printf ("Moving  up  a  level,  moving  element  ptr  up,  and  inner  loop.\n");‘/ 
return  inner_loop (--level, 1, element=element->up_ptr, 
wl ,  signal, new_3ignal, m, points) ; 

) 

else  ( 

/‘  printf ( "Moving  element  ptr  right  and  allocating  another  triad\n");‘/ 
points  +=elements; 

compute_tran3 (level, l,element”element->r_ptr,wl, signal, 
new_signal ,m, points) ; 

) 

) 


void  regenerate_signal (float  ‘new_signal,  wavelets  ‘wl,int  elements, 
int  points,  int  m,  float  mag) 

{ 

int  i; 

loopi (elements) { 

new_signal (m‘N_DELTA_T+i+point3 )  +”wl->wavelet (i) ‘mag; 

/*  printf ("%5.3g\n", new_signal [m‘N_DELTA_T+i ‘points ] )  ;  ‘ / 

) 

) 
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y**********************************«********«********************/ 


void  conipute_trans  (int  level,  list  *1,  grid_element  ‘element,  wavelets  ‘wl, 
float  ‘signal, float  ‘new_signal,  int  m,  int  points) 

( 

float  top, left, right, epsilon; 
int  elements; 

elements” (int) pow (2 . 0,  (double)  (LEVELS-level) )  ; 

/*  ONE  ‘/ 

allocate_grid_triad (1,  element) ; 

/‘  TWO  ‘/ 

/********•*****  added  this  section  •**••*»«*»*****»*»»/ 

if (element->up_ptr==NULL) { 

top”Compute_iproduct (Swl [level] , Ssignal (m‘N_DELTA_T+points ] ,  elements) ; 
element->iproduct_val”top; 
e lement-> input =t op; 

regenerate_signal  (new_signal, Swl [level] , elements, points, m, top) ; 

) 

/*  top”Compute_iproduct (Swl [level] , Ssignal [m*N_DELTA_T+points] , elements) ; */ 
top=element->iproduct_val; 

element->iproduct_val«top; 

left=compute_iproduct  (Swl  [  level  +  1  ]  ,  Ssignal  [m*N_DEI,TA_T+points]  ,  elements/2)  ; 
element ->lleg_ptr->iproduct_val=left; 
element->lleg_ptr->input=left  ; 
right”compute_iproduct (swl [level+1 1 , 

Ssignal [m*N_DELTA_T+points+elements/2 ] , elements/2) ; 
element->rleg_ptr->iproduct_val”right; 
eleraent->rleg_ptr->input “right ; 

/*  printf ( "ip=%5 . 3g  i”%5.3g  lip=%5.3g  li=%5.3g  rip=%5.3g  ri%5 . 3g\n\n" , 
element->iproduct_val, element -> input, 

element->lleg_ptr->iproduct_val ,  e lenient ->lleg_ptr-> input , 
element->rleg_ptr->iproduct_val , element->rleg_ptr-> input ) ; */ 
added  this  section  »*»•*****•*****•**•/ 
regenerate_signal (new_signal, Swl [level  +  1] , 
elements/2, points, ra, left) ; 
regenerate_signal (new_3ignal , Swl [ level+1 ] , 
elements/2, points+elements/2,m, right) ; 

if (level==LEVELS-2) { 

inner_loop (level , 1 .element, wl, signal, new_signal,m, points) ; 

) 

else  ( 

return  compute_trans (++level, 1, element=element->lleg_ptr, wl, signal, 
new_signal,m, points) ; 

) 


) 

/.«.**...****.<*«<>•**..*  get_wavelet s ()  *****♦*•*••*•*♦»**•••••••••*••**/ 

/*  This  procedure  creates  the  wavelet  function  for  each  resolution  used  ‘/ 

void  get_wave lets (int  level,  wavelets  *w) 

( 

int  1, elements; 

FILE  ‘fl; 

/*  printf ("\nCreating  wavelets  at  resolution  %d\n”, level );* / 
elements”  (int) pow (2.0,  (double)  (LEVELS-level) ) ; 

/*  printf ("No.  of  elements  is  %d\n", elements) ; ‘/ 

MALLOC (w->wavelet , elements, float, "allocating  wavelets") ; 
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loopl (elements) { 
if  (l<elements/2) 

w->wavelet [1 ] “ (float) sqrt ( (pow (2 . 0, (double) level) ) ) ; 
else 

w->wavelet (1) (float) sqrt ( (pow (2.0, (double) level) ) )  ; 

) 

/»  loopl (elements) { 

w->wavelet (1 ] ” (float) (sqrt (pow (2.0, (double) level) ) *sqrt (2.0)* 
sin  (2 . 0*Pl*pow  (2.0,  (double)  level)  * 

(double) 1/ (double)N_DELTA_T) ) ; 

) 

if  ((fl  -  fopen("wavelet.dat", "a") )  —  NULL)  ( 
printf ("\n***  Can't  create  %s  ***", "wavelet. dat") ; 
exit  (0)  ; 

) 

loopl (elements) 
printf (”%f\n",w->wavelet [I] ) ; 
fprintf (f 1 , "\n") ; 
f close  (f 1 )  ;  */ 

) 


/•••**•••««•*  compute_iproduct  •••»«»•*»«••»»»•••««*•/ 

/*  Tills  function  produces  tlie  inner  product  of  the  signal 
samples  at  a  particular  time  shift  and  resolution*/ 

float  compute_iproduct (wavelets  *w,  float  *signal,  int  elements) 
{ 

int  i  ; 
float  val; 

val*0; 

loopi (elements) 

val  +=w->wavelet [i 1 ‘signal (i 1 ; 
val  /=N_DELTA_T; 
return  val; 


) 

float  squa-  (float  val,  float  oldmax,  float  oldmin,  float  newmax,  float  newmin) 

( 

float  answer; 

answer  =  (  ( (val -oldmin)  /  (oldmax- oldmin)  )  *  (newmax-newmin'  -* newmin; 
return  answer; 

) 


void  compute_activity (grid_element  ‘element,  int  level, int  t,  int  which) 
{ 

float  a,exc,inh; 
char  stringtlO); 
grid_element  ‘left, ‘right; 
int  1; 

a*element->act ivity ; 
inh*0; 

exc“element->output+ABS2 (element->input) ; 

lef t“right“element ; 

if (level>which) 

loopl (level/2) ( 

if  (left->ljptr!-NULL)  ( 

left“lef t->l_ptr ; 

inh  +”left->output ; 

) 
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if (right->r_ptr!-NULL) { 
right«right->r_ptr; 
inh  +»right->output; 

) 

} 

element->activity»a+TAU*EPS* (-a+ (MAX-a) ‘exc- (MIN+a) *inh»COMPGAIN) 
/*  printf  (“activity“%5 . 3g\n\n",  eleinent->activity)  ; 
printf(”Hit  enter  to  continue"); 
gets  (string)  ;  */ 

) 

void  conipute_output  (grid_element  "element) 

{ 

float  a; 

a=-element->activity  ; 

/*  if(a<0) 

element->output=0; 

else 

if  (a>=04Sa<=l) 

element->output=a; 

else 

element->output=l ; * / 

eleraent->outpuf=(a-.0009<0) ?0; (SIGMOID (a- . 0009) ) ; 

) 


main  (int  argc,  char  **argv) 

( 

short  "temp, *temp2;  /*  integer  representation  of  original  data  */ 
list  *grid_list; 

grid_element  "node, *current_node, *top_node; 

wavelets  "wavelet_list; 

int  level , 1 , m, n, i ; 

int  tot_points,  points, elements ; 

float  signal (TOTl ; 

float  newsignal [TOT] , shortsignal (TOT) ; 
float  max, min; 
size_t  s_len; 

FILE  *fl, "f2, "f3, *f4, "f5; 
char  stringdO); 

my Sound* [ Sound  new] ; 

/*  get  signal,  temp  has  integer  representation  of  signal  "/ 
get_data (4 temp, 4temp2, argv [ 1 ) ) ; 
printf ("Got  data  successf ully ! \n" ) ; 
loopm (TOT) 

signal [m] « (float) temp [m] ; 

if  ( (fl  “  fopen ("signal .dat", "w") )  --  NULL)  ( 
printf ("\n""*  Can't  create  %s  """", "signal .dat") ; 
exit  (0)  ; 

) 

loopm (TOT) 

fprintf(fl,"%f\n", signal (m) ) ; 
grid_list“ma)ce_grid  0  ; 

MALLOC (node, M_TOTAL_DTS, grid_element, "allocating  first  node"); 
MALLOC (wavelet_list, LEVELS, wavelets, "allocating  wavelet  list")  ; 
loopl (LEVELS) 

get_wavelets (1, 4wavelet_li3t (1 ) ) ; 

/*  printf ("Going  into  main  loop\n") ;  */ 

loopm (M_TOTAL_DTS) ( 

al locate_node (grid_list , 4node [ra] ) ; 

/"  printf (“\ntime  is  %d  going  into  compute_trans\n",m) ;  "/ 

level-0; 

tot_points-0; 
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conipute_trans  (level, grid_list,  *node  [m]  ,wavelet_list,  signal, 
newsignal, m, tot_points) ; 

) 

printf ("starting  squashXn"); 

/***•  This  loop  finds  max  at  each  time  slice  and  uses  max  to  squash  •»*•»•/ 

loopm (M_TOTAL_DTS) ( 

top_node“*node [m) ; 

current_node“top_node; 

grid_list->current-top_node; 

max»min“0; 

/*  this  loop  checks  for  max  value  in  time  slice  */ 
loopl (LEVELS) ( 
points“0; 

elements- (int) pow (2 .0, (double) (LEVELS-!)); 
loopn (N_DELTA_T/elements) ( 
if (current_node->input>max) 
max-current_node->input ; 
if (current_node->input<min) 
min-current_node->input : 
i f (current_node->r_ptr  !=  NULL) { 
current_node”Current_node->r_pt r ; 

) 

) 

if  (grid_list->curr;.nt->lleg_ptr  !-  NULL)  ( 
grid_list->current-grid_list->current->lleg_ptr ; 
current_node-grid_list->current ; 

) 

1 

/•  end  of  check  max  loop  •/ 
if (max<ABS2 (min) ) 
max-ABS2 (min) ; 

printf ("Max”%f  min“%f  at  time  %d - Starting  squash. .. \n", max, min, m) ; 

/•  this  loop  squashes  all  values  between  using  max  as  old  max  */ 

top_node”4node (ml ; 

current_node=top_node; 

grid_list->current“top_node; 

loopl (LEVELS) { 

points-0; 

elements- (int) pow (2 . 0, (double) (LEVELS-1 ) ) ; 
loopn (N_DELTA_T/elements) { 

current_node->input»squash (current_node->input , max, -max,  1 . 0,  -1 . 0) ; 

/*  printf ("after  squashing;  iproduct=%5 . 3g  input=%5. 3g\n\n", 
current_node->iproduct_val , cur rent_node-> input ) ; 
printf ( "Press  enter  to  continue!"); 
gets (string) ; * / 

if (current_node->r_ptr  !-  NULL) ( 
current_node-currcnt_node->r_ptr; 

) 

) 

if (grid_list->current->lleg_ptr  !-  NULL) { 
grid_list->current”grid_list->curren.->lleg_ptr; 
current_node-grid_list->current ; 

) 

) 

/*  end  of  squash  loop  */ 

) 

printf ("end  of  squash  loop/n"); 


/••<••«•*•••«••  the  following  computes  lateral  inhibition  ♦**»*•**»»»***•*•/ 


loopm (M_TOTAL_DTS) ( 

printf ( "starting  LIN  of  time  %d\n",m); 
loopi (50) ( 
top_node-*node (m)  ; 
current_node”top_node; 
grid_list->current-top_node; 
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loopl (LEVELS) { 

elements" (int) pow (2.0, (double) ( LEVELS- 1) ) ; 
loopn (N_DELTA_T/elements) ( 

compute_activity (current_node, 1, i.atoi (argv[21 ) )  ; 
if (current_node->r_ptr  !=  NULL) ( 
current_node“Current_node->r_ptr; 

) 

) 

if (grid_list->current->lleg_ptr  !-  NULL) ( 
grid_list->current»grid_list->current->lleg_ptr; 
cur rent_node“gri d_list->cur rent; 

) 

) 

top_node“tnode [m] ; 
current_node"top_node ; 
grid_li3t->current“top_node; 
loopl (LEVELS) { 

elements- (int) pow (2 . 0, (double) (LEVELS-1) ) ; 
loopn (N_DELTA_T/elements) ( 
compute_output  (current_node) ; 
if (current_node->r_ptr  !«  NULL) { 
current_node=current_node->r_ptr ; 

) 

) 

if (grid_list->current->lleg_ptr  !=  NULL) { 
grid_list->current-grid_list->current->lleg_ptr; 
current_node"grid_list->current ; 

) 


) 


) 

) 

/•••••••««•••••>••  end  of  lateral  inhibition  *••*•*•**•*••*»••••»••*•**••••••• 

/************  the  following  regenerates  signal  from  compressed  grid  •*»**••/ 


loopm(M_TOTAL_DTS) ( 
top_node"&node (m) ; 
current_node"top_node; 
grid_list->current"top_node; 
loopl (LEVELS) ( 
if  (m==0iSl— 0)  ( 

printf ( ”top-%5 . 3g  lleg=%5.3g  rleg-%5 . 3g\n", 

current_node->iproduct_val , current_node->ileg_ptr->iproduct_val, 
current_node->rleg_ptr->iproduct_val) ; 
printf  ("newtop=%5 . 3g  newlleg=%f5. 3g  newrleg=%f .  3g\n'', 
current_node->output , current_node->lleg_ptr->output , 
current_node->rleg_ptr->output ) ; 

) 

points-0; 

elements" (int) pow (2 . 0, (double) (LEVELS-1) ) ; 
loopn (N_DELTA_T/elements) ( 
if (current_node->output ! -0) 

regenerate_signal ( short s i gna 1 , 4wavelet_list [ 1 ) , 
elements,  points,  m,  current_node->iproduc*’_val )  ; 
else 

regenerate_signal (short signal, twavelet_list (1 ) , 
elements, points, m, 0) ; 
points  +«elements; 
if (current_node->r_ptr  !-  NULL) { 
current_node“Current_node->r_ptr ; 

) 

) 

if  (grid_list->currerit->lleg_ptr  !-  NULL)  ( 
grid_list->current“grid_list->current->lleg_ptr; 
current_node-grid_list->current ; 
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) 

) 

) 

printf ("regenerated  short  signal\n") ; 

/**************«********************************«*****************************/ 
/**********  the  following  outputs  the  grid  and  compressed  grid  outputs  ••*****/ 
if  ({f3  «  fopen ("grid.dat", "w") )  «»  NULL)  { 
printf ("\n***  Can't  create  %s  ***", "grid.dat") ; 
exit  (0) ; 

) 

if  ( (f4  «  fopen ("compreased.dat", "w") )  -=  NULL)  { 
printf ("\n***  Can't  create  %s  ***", "compressed.dat") ; 
exit  (0)  ; 

) 

loopm(M_TOTAL_DTS) { 

top_node«tnode [mj ; 

cur  r  ent_node”t  op_node ; 

grid_list->current”top_node; 

loopl (LEVELS) ( 

points=0; 

elements' (int) pow (2 . 0, (double) (LEVELS-l) ) ; 
loopn (N_DELTA_T/elements) ( 
if {current_node->output ! =0) 

fprintf(f4,"%5.3g  ",  current_node->iproduct_val) ; 
else 

fprintf (f 4, "%5 . 3g  ",0); 

fprintf (f3, "%5 . 3g  ",  current_node->iproduct_val) ; 
if  (current_node->r_ptr  '.=  NULL)  ( 
current  node«current_node->r_ptr; 

) 

) 

fprintf (f3, "\n") ; 
fprintf  (f 4, "\n") ; 

if (grid_list->current->lleg_ptr  !=  NULL) { 
grid_list->current=grid_list->current->lleg_ptr; 
current_node'grid_list'>current ; 

) 

) 

) 

printf ("outputted  grid  and  compressed  grid  data\n"); 

^ 

if  ( (f2  “  f open ( "new_signal . dat " , "w" ) )  NULL)  ( 
printf ("\n***  Can't  create  %s  • ‘ , "new_signal . dat") ; 
exit  (0) ; 

) 

loopm (TOT) 

fprintf(f2,"%£\n", newsignal (mJ ) ; 

if  ( (f5  '  fopen ("short_signal.dat", "w") )  "  NULL)  ( 
printf ("\n***  Can't  create  %s  ***", "short_signal.dat") ; 
exit (0) ; 

) 

loopm (TOT) 

fprintf (f 5, "%f\n", short signal [m] ) ; 
loopi (TOT) { 

temp ( i ] - (short ) newsignal [i ) ; 
temp2 [ i ) “ (short) short signal ( i] ; 

) 

/*  get  length  of  input  file  not  counting  null  terminator  */ 
s_len'strlen (argv [ 1 ) ) ; 

/*  start  name  of  newsoundfile  with  "new_"  */ 
strcpy (newsoundfile, "new_") ; 

/•  start  name  of  shortsoundf ile  with  "short_"  •/ 
strcpy (shortsoundfile, "short_") ; 

/*  concatenate  input  file  name  to  "new_”  in  newsoundfile ( 1  */ 
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strcat (newsoundfile,argvIl]) ;  /*  newsound [ l-new_f ilename . and  */ 

/*  concatenate  input  file  name  to  "short_"  in  shortsoundfile [ )  */ 
strcat (short3oundfile,argv[l] ) ;  /•  short sound [l»3hort_filename.snd  */ 
SNDWriteSoundfile (newsoundfile, convertStruct)  ; 

SNDWriteSoundfile (shortsoundfile, shortStruct) ; 

fclose (f 1) ; 
fclose  (f2)  ; 
fclose (f 3) ; 
fclose (f 4) ; 
fclose  (f 5) ; 

) 
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Appendix  F.  Code  for  Morlet  Wavelet  Decomposition,  Compression,  and 

Reconstruction 


/••*•«**••****»********•*  morlet_wavelet .m  »*»«»*•••****••»****»*»•/ 
/*  Written  by  Rick  Ricart,  Capt,  OSAF  */ 
/*  This  program  computes  the  Morlet  wavelet  decomposition, compression,  */ 
/*  and  reconstruction  of  a  time  signal  */ 


♦import  <sound/sound. h> 
♦import  <math.h> 

♦import  <dsp/arrayproc.h> 
♦import  <mach.h> 

♦import  <stdlib.h> 

♦import  <stdio.h> 

♦import  <ob jc/ob jc .h> 

♦import  <soundkit/Sound.h> 
♦import  <soundkit/soundkit .h> 
♦import  <string.h> 

♦import  <macros.h> 


♦define  N_DELTA_T  256 
♦define  K_TOTAL_DFS  128 
♦define  M_TOTAL_DTS  31 

♦define  TOT  M_TOTAL_DTS*N_DELTA_T  /*  approximately  one  sec  of  sound  */ 

♦define  PI  3.141592654 

♦define  LEVELS  0 

♦define  OVERLAP  2 

♦define  MAX  1 

♦define  MIN  1 

♦define  COMPGAIN  10.0 

♦define  FO  0.25 

♦define  TAU  0.1 

♦define  EPS  1.0 


♦  define  ABS{x)  sqrt  (pow  (x,  2 . 0)  ) 

♦define  ABS2 (x)  ( (float ) sqrt ({ (double) x) *( (double) x) ) ) 

♦define  SQUARE (X)  ( (x) * (x) ) 

♦  define  SIGMOID  (x)  (SQUARE (x) )/ (SQUARE (x) +F0) 


/*  Global  Variables  */ 
id  mySound,  newSound; 

SNDSoundStruct  ‘soundStruct,  *convertStruct,  ‘shortStruct ; 
/*  the  filenames  */ 

char  newsoundf ile [80] , shortsoundf ile ( 80) ; 

void  compute_trans  ()  ; 
float  compute_iproduct 0 ; 

typedef  struct  grid_eiement  { 
struct  grid_element  ‘r_ptr; 
struct  grid_element  *l_ptr; 
struct  grid_element  *rleg_ptr; 
struct  grid_element  *lleg_ptr; 
struct  grid_element  *up_ptr; 
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float  iproduct_val; 
float  input; 
float  activity; 
float  output; 
int  type; 

)  gridi_element; 

typedef  struct  { 
grid_element  ‘head,  ‘current; 

)  list; 

typedef  struct  { 
float  ‘wavelet; 

)  wavelets; 

/***••***************  get_data  {)  »‘“*»**»*»**»»»»«»*»»* ***•»********/ 

/‘  This  procedure  opens  and  reads  the  signal  values  of  a  sound  file  ‘/ 

/****«*«*****************«****«****************«t**************tk******/ 

void  get_data (short  “temp, short  “temp2,char  ‘infile) 

( 

int  error,  data_size,i; 

BOOL  edit ; 

/* 

inPtr  =  (short  ‘)  ((char  ‘)  inputSound  +  inputSound->dataLocation) ; 
outPtr  ”  (short  *)  ((char  ‘)  ‘outputSound  +  (‘outputSound) ->dataLocation) 

*/ 

error  =  [mySound  readSoundf ile : inf ile ) ; 

/*  initialize  myoound  to  infile's  mySound  object  ‘/ 
soundStruct  -(mySound  soundStruct ) ; 

[mySound  isEditable]; 

data_size  =  soundStruct->dataSize; 

SNDAlloc (tconvertStruct,data_size, SND_F0RMAT_LINEAR_16,  SND_RATE_CODEC, 
soundStruct->channelCount, "") ; 

SNDAlloc (43hortStruct,data_size,SND_F0RMAT_LINEAR_16,SND_RATE_C0DEC, 
soundStruct->channelCount, "") ; 

SNDConvertSound (soundStruct, SconvertStruct ) ; 

SNDConvert Sound (soundStruct, ishortStruct) ; 

*temp=  (short  ‘ )  (convertStruct  +  convertStruct->dataLocation)  ; 

*temp2”  (short  ‘ ) (shortStruct+shortStruct->dataLocation) ; 

) 

list  ‘make_grid() 

{ 

list  ‘temp; 

MALLOC (temp, 1, list, "Allocating  list") ; 

temp->head“NULL; 

terap->current«NULL; 

/‘  printf("Made  mal«e_grid  successfullyNn")  ; ‘/ 
return  temp; 

) 


void  allocate_node (list  ‘1,  grid_element  ‘node) 
( 

if (l->head==NULL) { 

l->head=node; 

l->current=node; 

node ->l_pt r=node->r_ptr “NULL; 

) 

else  ( 

node->l_ptr»=l->current; 

l->current->r_ptr=node; 

1 ->cur rent “node; 

) 

node->rleg_ptr“node->lleg_ptr“NULL; 
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node->up_ptr«NULL; 

) 

void  allocate_grid_triad (list  *1,  grid_element  ‘element) 

( 

gr id_element  ‘tempi , ‘temp2 ; 

/*  printf {"\nAllocating  a  triadSn" ) ; ‘ / 

/‘  allocate  both  legs  ‘/ 

MALLOC (tempi , 1 , grid_element, "allocating  lleg") ; 

MALLOC (temp2, 1 , grid_element , "allocating  rleg") ; 

tempi ->tyr'»-0; 

temp2->ty, 2=1; 

element->lleg_ptr“templ ; 

element->rleg_ptr”temp2; 

templ->r_ptr“temp2; 

temp2->l_ptr=templ ; 

t  empl ->up_pt r-t  emp2 ->up_pt  r-el ement ; 

if (element->l_ptr==NULL| I element->l_ptr->rleg_ptr==NULL) 
temp 1 - > l_pt  r “NULL ; 
else  ( 

templ->l_ptr-element->l_ptr->rleg_ptr; 
element->l_ptr->rleg_ptr->r_ptr«templ  ; 

) 

t  emp2 - >  r_ptr=NULL; 

templ->rleg_ptr-temp2->rleg_ptr«templ->lleg_ptr-temp2->lleg_ptr=NULL; 
l->current=eleraent ; 

/•  printf ( "allocated  grid  triad  successfullyXn") ; ‘/ 

) 

void  regenerate_signal (float  *new_signai,  wavelets  *wl,int  elements, 
int  points,  int  m,  float  mag) 

{ 

int  i,tot; 

if (OVERLAP!=0) 

t ot =e 1 ement s * 2 ‘OVERLAP ; 

else 

tot*elements; 
loopi (tot) ( 

new_signal [m*N_DELTA_T+i+points-tot /2 )  +“wl->wavelet ( i ] ‘mag; 

/*  printf ("%5.3g\n", new_3ignal [m*N_DELTA_T+i ‘points) ) ; ‘/ 

) 

) 


void  inner_loop (int  level,  list  *1,  grid_element  ‘element,  wavelets  *wl, 
float  ‘signal , float  ‘new_signal, 
int  ra,  int  points) 

( 

int  elements; 

elements*  (int)  pow  (2.0,  (double)  (IXVELS- level)  )  ; 

if (level“=0) 

return; 

else 

if (element->r_ptr“”NULL) ( 
points  —elements; 

/‘  printf ("Moving  up  a  level,  moving  element  ptr  up,  and  inner  loop.\n");‘/ 
return  inner_loop (--level, 1, element ”element->up_ptr, 
wl ,  signal, new_signal, m, points) ; 

) 

else  { 

/*  printf ("Moving  element  ptr  right  and  allocating  another  triad\n") ; ‘/ 
points  ‘“elements; 
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conipute_tran3  (level,  1,  element«element->r_ptr, wl,  signal, 
new_signal,m, points)  ; 

) 

) 


/f**t**»t*****t*»»i,  compute_trans  ()  *♦•»»*«»•«»••**»••*•»***•***/ 

void  compute_trans (int  level,  list  *1,  grid_element  ‘element,  wavelets  *wl, 
float  ‘signal, float  ‘new_signal,  int  m,  int  points) 

{ 

float  top, left, right, epsilon; 
int  elements; 

elements” (int) pow (2 . 0, (double) (LEVELS-level) ) ; 

/*  ONE  ‘/ 

allocate_grid_triad (1,  element); 

/‘  TWO  */ 

/<«•••******•*«  added  this  section  *••••**•*•****♦***•/ 

if (element->up_ptr“”NULL) { 

top=compute_iproduct (Swl [ level] , isignal (m‘N_DELTA_T+pointsl , elements) ; 

eleraent->iproduct_val”top; 

element-> input =t op; 

regenerate_signal  (new_3ignal, Swl [level] , elements, points,  m,  top)  ; 

) 

/ *  top=compute_iproduct (swl [level ] , S signal [m‘N_DELTA_T+points] ,  elements)  ;  * / 
top”element->iproduct_val; 

/A****************************************************/ 

element->iproduct_val«top; 

lef t=compute_iproduct (swl [ level  +  1 ] , s signal [m‘N_DELTA_T+points ] ,  elements/2) ; 

element->lleg_ptr->iproduct_val=left ; 

element->lleg_ptr->input”lef t  ; 

right”Compute_iproduct (Swl [level+1] , 

s  signal [m‘N_DELTA_T+points+elements/2 ] , eleraents/2) ; 

element->rleg_ptr->iproduct_val=right ; 

element->rleg_ptr-> input “right ; 

/*  printf ("ip=%5 . 3g  i”%5.3g  lip=%5.3g  li=%5.3g  rip=%5.3g  ri%5 . 3g\n\n" , 
element->iproduct_val, element -> input , 

element ->lleg_ptr->iproduct_val, element ->lleg_ptr->input, 
element->rleg_ptr->iproduct_val , e lement->rleg_ptr-> input ) ; * / 

ADDED  THIS  SECTION  «*»*»•••*••♦•*****•/ 
regenerate_signal (new_signal, Swl [ level+1 ] , 
elements/2, points, m, left) ; 

regenerate_signal (new_signal , Swl [ level  +  1 ] , 
elements/2, points+elements/2,m, right) ; 

if (level”“LEVELS-2) ( 

ir ner_loop (level, 1, element, wl, signal, new_signal,m, points)  ; 

) 

else  ( 

return  compute_trans (++ level, 1 , element “element ->lleg_ptr, wl, signal, 
new_signal ,m, points) ; 

) 


) 

get_wavelets  ()  •••*••*•**♦•***********•*******./ 

/*  This  procedure  creates  the  wavelet  function  for  each  resolution  used  */ 

Z************************************************************************^ 

void  get_wavelet3 (int  level,  wavelets  ‘w,  wavelets  *iw) 

{ 

int  1 , elements, tot ; 
double  time, omega, cc; 
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FILE  *fl, *f2; 


/*  These  are  Morlet  wavelet  parameters  */ 

printf  (''\nCreating  wavelets  at  resolution  %d\n”,level); 

elements=  (int)  pow  (2 . 0,  (double)  (LEVELS-leveD); 

if  (OVERLAP  !=0) 

t ot=elements* 2 ‘OVERLAP; 

else 

tot=elements; 

/*  printf ("No.  of  elements  is  %d\n", elements) ; */ 

MALLOC (w->wavelet,tot, float, "allocating  wavelets") ; 

MALLOC (iw->wavelet,tot, float, "allocating  wavelets") ; 
omega=PI  *sqrt  (2 . 0/log  (2 . 0)  )  ; 
cc=exp (-pow (omega, 2 . 0) /2 . 0) ; 
loopl (tot) ( 

time=pow  (2.0,  (double) level) * (double)  (1-tot /2) / (double) N_DELTA_T; 

w->wavelet [1 ] *  (float)  ( (cos (omega ‘time) -cc) ‘ 

e.xp  (-PI‘pow  (time,  2.0))* 

sqrt ( (pow  (2.0,  (double) level) ) ) ) ; 

iw->wavelet [ 1 1  =  (float)  (  (sin (omega ‘time) -cc) ‘ 
e:;p  (-PI  *pow  (time ,  2 . 0)  )  * 
sqrt ( (pow (2 . 0, (double) level) ) ) ) ; 

) 

if  ((fl  =  f open ("rwavelet . dat" , "a") )  ==  NULL)  ( 
printf  ("\n‘“  Can't  create  %s  “  ‘  "rwavelet  .dat")  ; 
exit  (0)  ; 

) 

loopl (tot ) 

fprintf(fl,"%f\n",w->wavelet[ll); 
fprint  f  (f  1 ,  "\n  ")  ; 
f close  (f  1 ) ; 

if  ( (f2  =  f open ( "iwavelet . dat " , "a") )  ==  NULL)  ( 
printf("\n‘“  Can't  create  %s  “*",  "iwavelet  .dat  ”)  ; 
exit  (0)  1 
) 

loopl (tot) 

fprint  f(f.'’,"%f\n",  iw->wavelet  [11); 
fprintf (f2, "\n") ; 
f close  (f2) ; 

) 


/««•<**«•••**  compute  iproduct  »****•*»•*»»»*«•***»♦*/ 

/*  This  function  produces  the  inner  product  of  the  signal 
samples  at  a  particular  time  shift  and  resolution*/ 

^•««*<«*«*i>**«*««**««<**<i>******<<>***«i.*«**>. 

float  compute_iproduct  (wavelets  ‘w,  float  ‘signal,  int  elements) 

{ 

int  i,tot; 
float  val; 

val =0; 

if  (OVERLAP  !=0) 
tot=elements *2 ‘OVERLAP; 
else 

tot=elements; 
loopi  (tot ) 

val  +=w-> wave let (i) ‘signal (i-tot/2]  ; 
val  /=N_DELTA_T; 
return  val ; 

) 

float  squash (float  val,  float  oldmax,  float  oldmin,  float  newmax,  float  newmin) 

{ 
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float  answer; 


answer* ' ( (val-cldmin) / (oldmax-oldmin) ) * (newmax-newmin) ) +newmin; 
return  answer; 

} 

void  compute_activity (grid_element  ‘element,  int  level, int  t,  int  which) 

float  a,exc,inh; 
char  string [10] ; 
grid_element  ‘left, ‘right; 
int  1; 

a=element->activity ; 
inh=0 ; 

exc=eleraent->output+element->input ; 

left=right=element  ; 

if {level>which) 

loopl (level/2) { 

if (left->l_ptr!=NULL)  { 

left=left->l_ptr ; 

inh  +=left->output ; 

) 

if (right->r_ptr I =NULL)  ( 
right=right->r_ptr ; 
inh  +=right->output ; 

) 

) 

element->activity=a+TAU*EPS* (-a+ (MAX-a) ‘exc- (MIN+a) ‘inh‘COMPGAIN)  ; 

/*  printf("activity»%5.3g\r.\n" ,  element ->activity)  ; 
printf("Hit  enter  to  continue"); 
gets (string) ; • / 

) 


void  compute_output (grid_element  ‘element) 

{ 

float  a; 

a=element->activity ; 

/♦  if(a<0) 

element->output*0; 

else 

if  (a>  =  0sia<=l) 
element->output=a ; 
else 

element->output*l ; * / 

element ->output* (a- . 0009<0) ?0 :  (SIGMOID (a- . 0009) )  ; 
) 


main  (int  argc,  char  “argv) 

( 

short  ‘temp, ‘temp2;  /‘  integer  representation  of  original  data  ‘/ 

list  ‘rgrid_list, ‘igrid_list; 

grid_element  ‘node, ‘current_node, ‘top_node; 

grid_element  ‘inode, ‘icurrent_node, *itop_node;  /‘  added  this  ‘/ 

wavelets  *wavelet_list, ‘iwavelet_li3t;  /‘added  iwavelet_list  ‘/ 

int  level , 1 , m, n, i ; 

int  tot_points,  points, elements; 

float  signal  (T0T+0VERLAI>‘2*N_DELTA_T1  ; 

float  newsignal [T0T+0VERLAP‘2‘N_DELTA_T1 ; 

float  shortsignal (TOT+OVERLAP‘2‘N_DELTA_TI ; 

float  max,  min,  energy_tot ,  enercjy_trunc; 

si2e_t  s_len; 

FILE  ‘fl, *f2, ‘f3, ‘f4, ‘f5; 
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char  string[10); 
my Sound* [Sound  new) ; 

/*  get  signal,  temp  has  integer  representation  of  signal  */ 
get_data (Stemp, *temp2, argv [1 ] ) ; 
printf(”Got  data  successfully ! \n“) ; 
loopm (TOT) 

signal [m+OVERLAP*N_DELTA_T] « (float) temp (m) ; 
if  ((fl  *  fopen(“signal.dat*,"w") )  **  NULL)  { 
printf ("\n***  Can't  create  %s  ***", "signal. dat") ; 
exit  (0) ; 

) 

loopm (TOT) 

fprintf (fl, ”%f\n", signal [m+OVERLAP*N_DELTA_Tl ) ; 
rgrid_list*ma)ce_grid()  ; 
igrid_list=ma)ce_grid()  ; 

MALLOC (node, M_TOTAL_DTS,grid_element, "allocating  first  layer  real  nodes"); 
MALLOC (inode, M_TOTAL_DTS, grid_element, "allocating  first  layer  imag  node"); 
MALLOC (wavelet_list, LEVELS, wavelets, "allocating  real  wavelet  list"); 

MALLOC (iwavelet_list, LEVELS, wavelets, "allocating  imag  wavelet  list"); 
loopl (LEVELS) 

get_wavelets  (1, Swavelet_list (1 1 , siwavelet_list [1] ) ; 
printf ("Going  into  main  loop\n") ; 

/*  This  loop  produces  the  inner  products  of  the  real  part  of  the  wavelet  */ 

loopm (M_TOTAL_DTS) ( 

allocate_node (rgrid_list, inode [m] ) ; 

/*  printf ("\ntime  is  %d  going  into  compute_trans\n",m) ;  */ 

level=0; 

tot_points»0; 

compute_trans (level , rgrid_list , inode [m] , wavelet_list, 
isignal [ OVERLAP * N_DELTA_T 1 , inewsignal [ OVERLAP *N_DELTA_T) , 
m,tot_points) ; 

) 

/*  This  loop  produces  the  inner  products  of  the  imag.  part  of  the  wavelet  */ 
loopm (M_TOTAL_DTS) { 

allocate_node (igrid_list, iinode [mj ) ; 

level=0 ; 

tot_points=0; 

compute_trans (level , igrid_list , iinode (m) , iwavelet_list , 
isignal [OVERLAP*N_DELTA_T) , inewsignal ( OVERLAP *N_DELTA_T) , 
m, tot_points) ; 

) 

printf ("starting  squashXn") ; 

/•***  This  loop  finds  max  at  each  time  slice  and  uses  max  to  squash  ****«•/ 

loopm (M_TOTAL_DTS) ( 

top_node=inode [m] ; 

itop_node=iinode [m] ; 

current_node=top_node; 

icurrent_node”top_node; 

rgrid_list->current«top_node; 

igrid_list->current“itop_node; 

max*0; 

loopl (LEVELS) ( 
points”0; 

elements* (int) pow (2.0, (double) (LEVELS-1) ) ; 
loopn (N_DELTA_T/elements) ( 
if ( (current_node->input= 

(float) sqrt (pow ( (double) current_node->input, 2 . 0) + 
pow ( (double) icurrent_node->input, 2.0))) >max) 
max”cur rent_node->input ; 
if (current_node->r_ptr  !=  NULL) ( 
current_node“Current_node->r_ptr ; 
icurrent_node“icurrent_node->r_ptr; 

) 

) 
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if (rgrid_li3t->current->lleg_ptr  !-  NULL) { 
rgridi_list->current«rgridl_list->current->lleg_jptr; 
igrid_list->current«igrid_list->current->lleg_ptr; 
current_node“rgrid_list->current; 
icurrent_node“igrid_list->current ; 

) 

) 

printf  ("Max»%f  at  time  %d - Starting  squash.  .  .\n", max, min, m)  ; 

top_node«tnode (m) ; 

current_node“top_node; 

rgrid_list->current“top_node; 

loopl (LEVELS) ( 

points=0; 

elements” (int)pow (2.0, (double) (LEVELS-1)); 
loopn (N_DELTA_T/elements) { 

current_node->input»squash (current_node->input,max,  0,1.0,0); 
if (current_node->r_ptr  !*  NULL) ( 
current_node=current_node->r_ptr; 

) 

) 

if (rgrid_list->current->lleg_ptr  1=  NULL) ( 
r grid_li St -> cur rent =rgrid_list ->cur rent ->lleg_ptr ; 
current_node=rgrid_li3t->current.• 

1 

) 

) 

printf("end  of  squash  loop\n") ; 

/»l,l:t»*ittll*1,t*t,tili»t»i,ttt********»**t»*****»*»*lt*lt******t»***ttti,f******lttt/ 

/ttttttt*******  the  following  computes  lateral  inhibition  •***•****•»*•*••»/ 
loopm(M_TOTAL_DTS) { 

printf ("\nstarting  LIN  at  time  %d\n",m) ; 

loopi  (50)  ( 

top_node=4node [m] ; 

current_node”top_node; 

rgrid_list->current”top_node; 

loopl (LEVELS) { 

elements” (int) pow (2 . 0, (double) (LEVELS-1) ) ; 
loopn (N_DELTA_T/elements) ( 

compute_activity (current_node, 1, i, atoi (argv[2) ) ) ; 
if (current_node->r_ptr  !=  NULL) ( 
current_node”Current_node->r_ptr; 

) 

) 

if (rgrid_list->current->lleg_ptr  !”  NULL) ( 
rgrid_list->current=rgrid_list->current->lleg_ptr; 
current_node=rgrid_list->current  ; 

} 

) 

top_node”tnode [m] ; 
current_node”top_node; 
rgrid_list->current”top_node; 
loopl (LEVELS) ( 

elements- (int) pow (2 . 0, (double) (LEVELS-!)); 
loopn (N_DELTA_T/elements) ( 
compute_output (current_node) ; 
if (current_node->r_ptr  !-  NULL) ( 
current_node“Current_node->r_ptr; 

) 

) 

if (rgrid_list->current->lleg_ptr  !-  NULL) ( 
rgrid_list->current-rgrid_list->current->lleg_ptr; 
current_node-rgrid_list->current ; 

) 

) 
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end  of  lateral  inhibition  »•*.**•*»••**•******••»•*******•/ 
/••«.«»*«•»«•••  the  following  energy  normalizes  •*•*«»**••****•»*/ 
printf ("\nstarting  energy  normalization\n",m) ; 
loopm(M_TOTAL_DTS) { 
top_node“«node [m] ; 
itop_node“tinode [m] ; 
current_node“top_node; 
icurrent_node=top_node; 
rgrid_list->current«top_node; 
i gr id_li St -> current “it op_node; 

/**  find  the  total  energy  and  truncated  energy  at  each  resolution  **/ 
loopl (LEVELS) { 

elements“ (int ) pow (2. 0, (double) (LEVELS-1)); 
loopn (N_DELTA_T/elemcnts) { 

energy_tot  +“  (float) pow ( (double) current_node->input, 2 . 0)  ; 
if (current_node->output ! =0) 

energy_trunc  +“  (float ) pow ( (double) current_node->input,  2 . 0)  ; 
if (current_node->r_ptr  !=  NULL) 
current_node“Current_node->r_ptr ; 


current_node=rgrid_list->current ; 

/**  multiply  both  the  real  and  imag  coefficients  by  the  energy  change  ** / 
loopn (N_DELTA_T/elements) { 

current_node->iproduct_val  *»  energy_tot/energy_trunc; 
icurrent_node->iproduct_val  *•=  energy_tot/energy_trunc; 
i f (current_node->r_ptr  !=  NULL) ( 
current_node=current_node->rjptr; 
icurrent_node=icurrent  node->r_ptr; 

/ 

) 

if (rgrid_list->current->lleg_ptr  !=  NULL) ( 
rgrid_list->current“rgrid_list->current->lleg_ptr; 
igrid_list->current“igrid_list->current->lleg_ptr; 
current_node=rgrid_list->current; 
icurrent_node=igrid_list->current ; 

) 

) 

) 


gnd  Qf  energy  normalization 


A**************************  ! 


the  following  regenerates  signal  from  compressed  grid  **»•***/ 
printf ("\nstarting  to  regenerate  signalsNn") ; 
loopm (M_TOTAL_DTS) { 
top_node=4 node [m) ; 
itop  node”4inode (m) ; 
current_node“top_node; 
i cu  r  rent_node=t op_node  ; 
rgrid_list->current“top_node; 
igrid_list->current“itop_node; 
loopl  (LEVELS)  ( 
points“0; 

elements" (int) pow (2 . 0, (double) (LEVELS-1)); 
loopn (N_DELTA_T/elements) { 
i f (current_node->output ! “0) ( 

regenerate_signal (tshortsignal [ OVERLAP *N_DELTA_T) , 

4wavelet_list (1) , 

elements, points, m, current_node->iproduct_val) ; 
regenerate_signal  (tshortsignal [ OVERLAP *N_DELTA_TJ , 
t i wa velet_l ist ( 1 ) , 

elements, points, m, icurrcnt_node->iproduct_val)  ; 

) 

else  ( 
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regenerate_3ignal (Ishortsignal (OVERIAP*N_DELTA_T) , 
swavelet_list (1] , 
elements, points, m, 0) ; 

regenerate_signal (Cshortsignal f OVERLAP *N_DELTA_TJ , 
s iwavelet_list [1] , 
elements, points,®, 0) ; 

) 

points  +«elements; 
if (current_node->r_ptr  !»  NULL) { 
current_node«current_node->r_ptr; 
icurrent_node“icurrent_node->r_ptr; 

) 

) 

if (rgrid_list->current->lleg_ptr  !-  NULL) { 
rgrid_list->current-rgrid_li3t->current->lleg_ptr; 
igrid_li3t->current«igrid_list->currcnt->lleg_ptr; 
current_node“rgrid_list->current; 
i cur r ent_node“igr id_list-> current ; 

) 


printf ("regenerated  short  signalXn"); 

/•**»************«******lr**<«**l>**<«*««*»»****««*****0«***>«***<«*«****<*«**«*/ 

/**********  the  following  outputs  the  grid  and  compressed  grid  outputs  *•»**»*/ 
if  ( (f3  =  fopen ("grid.dat", "w") )  “  NOLL)  ( 
printf (”\n***  Can't  create  %s  •**", "grid.dat") ; 
exit  (0)  ; 

) 

if  ((f4  =  fopen ("compressed.dat", "w”) )  ==  NOLL)  { 
printf ("\n***  Can't  create  %s  ***", "compressed.dat") ; 
exit  (0)  ; 

) 

loopm(M_TOTAL_DTS) ( 
top_node=inode (m) ; 
current_node=t op_node; 
rgrid_list->current<=top_node; 
loopl (LEVELS) ( 
points=0; 

elements* (int) pow (2.0, (double) (LEVELS-1) ) ; 
loopn (N_DELTA_T/elements) { 

if (current_node->output !“0)  /"change  this  if  */ 
fprintf (f 4, "%5 . 3g  ", current_node->input) ; 
else 

fprintf (f4,"%5.3g  ",0) ; 

fprintf (f3,"%5.3g  ",  current_node->input) ; 
i f  (current_noc’  ’r_ptr  !-  NULL)  { 
current_node»current_node->r_ptr; 

) 

) 

fprintf (f3, "\n") ; 
fprintf  (f4,  "\n")  ; 

if (rgrid_list->current->lleg_ptr  !-  NULL) ( 

rgrid_list->current“rgrid_list->current->lleg_ptr; 

current_node“rgrid_list->current; 

) 

) 

) 

printf ("outputted  grid  and  compressed  grid  dataXn"); 

. . . . . 

if  ( (f2  ”  fopen ( "new_signal . dat", "w") )  -«  NULL)  ( 
printf  ("\n***  Can't  create  %s  *  “  "new_signal .  dat")  .- 
exit (0) ; 

) 

loopm (TOT) 

fprintf (f2, "%f \n",newsignal [m+OVERLAF  *N_DELTA_T) ) ; 
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if  ( (f5  “  fopen ("short_signal . dat ", "w") )  NOLL)  { 
printf (“\n***  Can't  create  %s  ** *  ", “short_signal . dat") ; 
exit  (0)  ; 

) 

loopm (TOT) 

fprintf (f5, "%f\n", shortsignal [m+OVERLAP*N_DELTA_T) )  ; 
loopi (TOT) { 

temp [i) “ (short) news ignal [i+OVERLAP*N_DELTA_T] ; 
temp2 [i] - (short) shortsignal [i+OVERLAP*N_DELTA_T] ; 

} 

/*  get  length  of  input  file  not  counting  null  terminator  */ 
s_len”strlen (argv[l] ) ; 

/*  start  name  of  newsoundfile  with  "new_"  */ 
strcpy (newsoundfile, "new_") ; 

/*  start  name  of  shortsoundf ile  with  "short_"  */ 
strcpy (shortsoundfile, "3hort_") ; 

/•  concatenate  input  file  name  to  "new_"  in  newsoundfile [ J  */ 
strcat (newsoundfile, argv ( 1 )) ;  /*  newsoundl 1 -new_f ilename . snd  */ 

/*  concatenate  input  file  name  to  "3hort_"  in  shortsoundfile!)  */ 
strcat  (shortsoundf  ile,  argv  [  1 1 )  ;  /*  shortsoundl ) ''short_f  ilename .  snd  */ 
SNDWriteSoundf ile (newsoundfile, convertStruct) ; 

SNDWriteSoundf ile (shortsoundfile, shortStruct) ; 

f close  (f 1 )  ; 
fclose (f2)  ; 
f close (f 3) ; 
fclose (f4) ; 
fclose  (f 5) ; 
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